Blog
Wild & Free Tools

Anthropic Prompt Caching for System Prompts

Last updated: April 2026 6 min read

Table of Contents

  1. How prompt caching works
  2. When it pays off
  3. How to enable it
  4. Cost math
  5. OpenAI auto-cache
  6. Gemini context caching

If your AI app has a long system prompt and a high volume of requests, you are paying for those system prompt tokens on every single call. Anthropic's prompt caching feature dramatically reduces that cost — up to 90 percent for cached portions of the prompt. Most builders do not enable it because they have not heard of it. This guide is the heads-up.

Use the AI cost calculator to model your spend with and without prompt caching enabled.

How Prompt Caching Works

Normally, every API request to Claude includes the full prompt (system prompt + user message) and you pay the full per-token rate for every token. Prompt caching tells Anthropic: "I send this same chunk of text on every request — cache it, and charge me less for it."

The first request that defines a cache hits a slightly higher rate (cache write). Every subsequent request that uses the same cached content hits a much lower rate (cache read) — typically about 10 percent of the normal input token cost. The cache lives for 5 minutes by default and is refreshed every time it is used.

When Prompt Caching Pays Off

Prompt caching is worth enabling when:

If your system prompt is 200 tokens, caching does not help — it is below the minimum eligibility. If your system prompt changes per request, caching does not help — every request is a cache miss. If you make one request per hour, the cache expires before reuse.

How to Enable Prompt Caching

In your Anthropic API call, add a cache_control breakpoint to the system message:

system: [ { type: "text", text: "Your long system prompt here...", cache_control: { type: "ephemeral" } } ]

Everything before the breakpoint gets cached. You can place breakpoints in multiple positions if you want to cache parts but not others. Most apps just cache the full system prompt.

Sell Custom Apparel — We Handle Printing & Free Shipping

Cost Math With and Without Caching

Example: a 2,000-token system prompt at Claude 3.5 Sonnet pricing.

For a high-volume app, prompt caching is the single biggest cost optimization available. Use the AI cost calculator to model your specific scenario.

OpenAI Has Automatic Prompt Caching Now Too

OpenAI added automatic prompt caching for prompts over 1,024 tokens. You do not have to do anything — the API caches eligible prompts automatically and gives you a 50% discount on cached input tokens. This is less aggressive than Anthropic's 90% but easier to use (no setup required).

If you are running on OpenAI, just make sure your system prompts are over 1,024 tokens to qualify. If they are shorter, padding to hit the threshold is rarely worth it — the savings from caching are smaller than the extra cost of the padding tokens.

Gemini Has Context Caching Too

Google Gemini supports context caching with a similar opt-in API. The savings are comparable to Anthropic's. The minimum cacheable size is 32K tokens, which is much higher than Anthropic or OpenAI — practical only for apps with very long context (RAG with large knowledge bases, codebases, etc.).

Estimate Your Cached Prompt Savings

Use our cost calculator to model spend with caching on vs off.

Open System Prompt Generator
Launch Your Own Clothing Brand — No Inventory, No Risk