smith-prompts
Prompt engineering standards for AI interactions with cache optimization. Use when writing AI prompts, optimizing context usage, or structuring AGENTS.md files. Covers prompt caching, token efficiency, and progressive disclosure patterns.
When & Why to Use This Skill
This Claude skill establishes advanced prompt engineering standards focused on cache optimization and token efficiency. It provides a technical framework for structuring AGENTS.md files and AI prompts to maximize cache hits, potentially reducing costs by 90% and latency by 85% through strategic content ordering and progressive disclosure patterns.
Use Cases
- Optimizing high-frequency AI agent interactions to maximize prompt cache hits and significantly reduce operational overhead.
- Structuring complex AGENTS.md files using a cache-friendly architecture that separates static instructions from dynamic project data.
- Implementing progressive disclosure and sparse attention patterns to handle large-scale context without exceeding token limits or degrading performance.
- Designing robust structured outputs and tool schemas to ensure consistent AI behavior across different LLM providers like Anthropic, OpenAI, and Gemini.
| name | smith-prompts |
|---|---|
| description | Prompt engineering standards for AI interactions with cache optimization. Use when writing AI prompts, optimizing context usage, or structuring AGENTS.md files. Covers prompt caching, token efficiency, and progressive disclosure patterns. |
Prompt Engineering Standards
- Load if: Writing AI prompts, optimizing context usage
- Prerequisites: @smith-principles/SKILL.md
CRITICAL: Prompt Caching (Primacy Zone)
Cache reduces costs 90%, latency 85%
Structure for caching:
- Static content first (methodology, rules)
- Tool definitions in consistent order
- Project context (AGENTS.md, docs)
- Dynamic content last (recent changes)
Cache breakpoints: Every ~1024 tokens. Prefix must be identical for cache hit.
- Reordering tools between calls
- Injecting dynamic content into static sections
- Modifying cached prefix unnecessarily
- Using Markdown tables (see
@smith-skills/SKILL.md- use bullet lists instead)
AGENTS.md Cache-Friendly Structure
<!-- STATIC - cached -->
<metadata>
Scope, Load if, Prerequisites
</metadata>
<required>
Critical NEVER/ALWAYS rules
</required>
<forbidden>
Anti-patterns
</forbidden>
<!-- CACHE BREAKPOINT (~1024 tokens) -->
<!-- DYNAMIC - not cached -->
<examples>
Code examples that evolve
</examples>
Token Efficiency
Progressive Disclosure
Three-level loading:
- Metadata only (50 tokens)
- Core concepts when triggered (200 tokens)
- Full details when accessed (1000+ tokens)
Sparse Attention
Efficient file reading:
- Grep to find location
- Read with offset/limit for large files
- Read only necessary context (±20 lines)
- Loading full files when targeted reads suffice
- Reading documentation when metadata answers the question
- Repeating user's question in responses
Structured Output
Platform mechanisms:
- OpenAI: JSON Schema with
strict: true(100% compliance) - Anthropic: Tool use with flexible schemas
- Gemini: responseSchema with retry
Schema design:
- Match existing project patterns
- Include descriptions for complex fields
- Define required vs optional fields
- Keep nesting ≤3 levels
- @smith-ctx/SKILL.md - Progressive disclosure, reference-based communication
@smith-xml/SKILL.md- Approved XML tags
ACTION (Recency Zone)
For caching:
- Place static content before dynamic
- Maintain consistent tool order
- Target >80% cache hit rate
For efficiency:
- Use Grep before Read
- Read incrementally (narrow → expand)
- Use file:line references