Anthropic Prompt Caching for System Prompts

Last updated: April 2026 6 min read

How prompt caching works
When it pays off
How to enable it
Cost math
OpenAI auto-cache
Gemini context caching

If your AI app has a long system prompt and a high volume of requests, you are paying for those system prompt tokens on every single call. Anthropic's prompt caching feature dramatically reduces that cost — up to 90 percent for cached portions of the prompt. Most builders do not enable it because they have not heard of it. This guide is the heads-up.

Use the AI cost calculator to model your spend with and without prompt caching enabled.

How Prompt Caching Works

Normally, every API request to Claude includes the full prompt (system prompt + user message) and you pay the full per-token rate for every token. Prompt caching tells Anthropic: "I send this same chunk of text on every request — cache it, and charge me less for it."

The first request that defines a cache hits a slightly higher rate (cache write). Every subsequent request that uses the same cached content hits a much lower rate (cache read) — typically about 10 percent of the normal input token cost. The cache lives for 5 minutes by default and is refreshed every time it is used.

When Prompt Caching Pays Off

Prompt caching is worth enabling when:

Your system prompt is long — 1,024+ tokens minimum to be eligible for caching
Your system prompt is stable — cached content must match exactly between requests
You make frequent requests — cache is most efficient when many requests hit the same cached chunk within 5 minutes

If your system prompt is 200 tokens, caching does not help — it is below the minimum eligibility. If your system prompt changes per request, caching does not help — every request is a cache miss. If you make one request per hour, the cache expires before reuse.

How to Enable Prompt Caching

In your Anthropic API call, add a cache_control breakpoint to the system message:

system: [ { type: "text", text: "Your long system prompt here...", cache_control: { type: "ephemeral" } } ]

Everything before the breakpoint gets cached. You can place breakpoints in multiple positions if you want to cache parts but not others. Most apps just cache the full system prompt.

Cost Math With and Without Caching

Example: a 2,000-token system prompt at Claude 3.5 Sonnet pricing.

Without caching: $3 per million input tokens × 2,000 tokens × 10,000 daily requests = $60/day = $1,800/month just for system prompt tokens.
With caching: First request pays cache write (~25% premium). Remaining 9,999 requests pay cache read (~10% of normal rate). Total ≈ $200/month — a 90% reduction.

For a high-volume app, prompt caching is the single biggest cost optimization available. Use the AI cost calculator to model your specific scenario.

OpenAI Has Automatic Prompt Caching Now Too

OpenAI added automatic prompt caching for prompts over 1,024 tokens. You do not have to do anything — the API caches eligible prompts automatically and gives you a 50% discount on cached input tokens. This is less aggressive than Anthropic's 90% but easier to use (no setup required).

If you are running on OpenAI, just make sure your system prompts are over 1,024 tokens to qualify. If they are shorter, padding to hit the threshold is rarely worth it — the savings from caching are smaller than the extra cost of the padding tokens.

Gemini Has Context Caching Too

Google Gemini supports context caching with a similar opt-in API. The savings are comparable to Anthropic's. The minimum cacheable size is 32K tokens, which is much higher than Anthropic or OpenAI — practical only for apps with very long context (RAG with large knowledge bases, codebases, etc.).

Estimate Your Cached Prompt Savings

Use our cost calculator to model spend with caching on vs off.

Open System Prompt Generator

Anthropic Prompt Caching for System Prompts

Table of Contents

How Prompt Caching Works

When Prompt Caching Pays Off

How to Enable Prompt Caching

Cost Math With and Without Caching

OpenAI Has Automatic Prompt Caching Now Too

Gemini Has Context Caching Too

Estimate Your Cached Prompt Savings

Related Posts

AI Cost Calculator

Free Token Counter

GPT vs Claude vs Gemini Prompts