Anthropic Prompt Caching for System Prompts
Table of Contents
If your AI app has a long system prompt and a high volume of requests, you are paying for those system prompt tokens on every single call. Anthropic's prompt caching feature dramatically reduces that cost — up to 90 percent for cached portions of the prompt. Most builders do not enable it because they have not heard of it. This guide is the heads-up.
Use the AI cost calculator to model your spend with and without prompt caching enabled.
How Prompt Caching Works
Normally, every API request to Claude includes the full prompt (system prompt + user message) and you pay the full per-token rate for every token. Prompt caching tells Anthropic: "I send this same chunk of text on every request — cache it, and charge me less for it."
The first request that defines a cache hits a slightly higher rate (cache write). Every subsequent request that uses the same cached content hits a much lower rate (cache read) — typically about 10 percent of the normal input token cost. The cache lives for 5 minutes by default and is refreshed every time it is used.
When Prompt Caching Pays Off
Prompt caching is worth enabling when:
- Your system prompt is long — 1,024+ tokens minimum to be eligible for caching
- Your system prompt is stable — cached content must match exactly between requests
- You make frequent requests — cache is most efficient when many requests hit the same cached chunk within 5 minutes
If your system prompt is 200 tokens, caching does not help — it is below the minimum eligibility. If your system prompt changes per request, caching does not help — every request is a cache miss. If you make one request per hour, the cache expires before reuse.
How to Enable Prompt Caching
In your Anthropic API call, add a cache_control breakpoint to the system message:
system: [ { type: "text", text: "Your long system prompt here...", cache_control: { type: "ephemeral" } } ]
Everything before the breakpoint gets cached. You can place breakpoints in multiple positions if you want to cache parts but not others. Most apps just cache the full system prompt.
Sell Custom Apparel — We Handle Printing & Free ShippingCost Math With and Without Caching
Example: a 2,000-token system prompt at Claude 3.5 Sonnet pricing.
- Without caching: $3 per million input tokens × 2,000 tokens × 10,000 daily requests = $60/day = $1,800/month just for system prompt tokens.
- With caching: First request pays cache write (~25% premium). Remaining 9,999 requests pay cache read (~10% of normal rate). Total ≈ $200/month — a 90% reduction.
For a high-volume app, prompt caching is the single biggest cost optimization available. Use the AI cost calculator to model your specific scenario.
OpenAI Has Automatic Prompt Caching Now Too
OpenAI added automatic prompt caching for prompts over 1,024 tokens. You do not have to do anything — the API caches eligible prompts automatically and gives you a 50% discount on cached input tokens. This is less aggressive than Anthropic's 90% but easier to use (no setup required).
If you are running on OpenAI, just make sure your system prompts are over 1,024 tokens to qualify. If they are shorter, padding to hit the threshold is rarely worth it — the savings from caching are smaller than the extra cost of the padding tokens.
Gemini Has Context Caching Too
Google Gemini supports context caching with a similar opt-in API. The savings are comparable to Anthropic's. The minimum cacheable size is 32K tokens, which is much higher than Anthropic or OpenAI — practical only for apps with very long context (RAG with large knowledge bases, codebases, etc.).
Estimate Your Cached Prompt Savings
Use our cost calculator to model spend with caching on vs off.
Open System Prompt Generator
