Blog
Custom Print on Demand Apparel — Free Storefront for Your Business
Wild & Free Tools

How to Count Tokens Before Sending to an LLM API

Last updated: April 20266 min readAI Tools

Counting tokens before you send a prompt to an LLM API is a 1-second habit that prevents three problems: context window errors, surprise costs, and slow responses. Here is exactly how to do it for any model in 2026.

The Three Reasons to Count First

1. Context window errors. Every LLM has a hard token limit per request. GPT-4o is 128K. Claude is 200K. Gemini 2.5 Pro is 2M. Send more than the limit and the API rejects your call with an error. Counting first catches this before you waste a network round-trip.

2. Cost surprise prevention. A 50K token input on GPT-4o costs $0.125. On Claude Opus 4 it costs $0.75. If you don't know the input size before sending, you don't know the cost. Counting first removes the surprise.

3. Model routing. If your input is small (under 4K tokens), a cheap model is fine. If your input is huge (50K+), you need a model with a large context window — and you may want to use a cheaper model to control cost. Counting first lets you route intelligently.

The 5-Second Method

For one-off checks:

  1. Open the Token Counter
  2. Paste your prompt + system message + any retrieved context + chat history
  3. Read the token count
  4. Compare to your model's context window minus your expected output size

If it fits, send. If it doesn't, truncate or switch models.

Check token count for any prompt in 5 seconds.

Open Token Counter →

The In-Code Method

For production systems, count programmatically before each API call:

// Pseudocode for any LLM
function safeCallLLM(systemPrompt, userPrompt, model) {
  const totalTokens = countTokens(systemPrompt + userPrompt);
  const contextLimit = MODELS[model].contextWindow;
  const reservedForOutput = 4000;

  if (totalTokens > contextLimit - reservedForOutput) {
    // Option 1: truncate
    userPrompt = truncateToFit(userPrompt, contextLimit - reservedForOutput - countTokens(systemPrompt));
    // Option 2: switch model
    // model = pickLargerModel(totalTokens);
    // Option 3: chunk and process separately
  }

  return callAPI(systemPrompt, userPrompt, model);
}

Common token counting libraries:

Context Window Sizes by Model (April 2026)

ModelContext windowReserve for outputSafe input limit
GPT-4o, GPT-4.1128K4K124K
GPT-4o mini128K4K124K
Claude Haiku 3.5200K4K196K
Claude Sonnet 4200K8K192K
Claude Opus 4200K8K192K
Gemini 2.0 Flash1M8K992K
Gemini 2.5 Flash1M8K992K
Gemini 2.5 Pro2M8K1.99M

What to Do When You're Over the Limit

Strategy 1 — Truncate. Drop the oldest content first (chat history, less relevant retrieved context). Keep the most recent and most relevant.

Strategy 2 — Summarize. Replace long history or context with a shorter summary. Cost: one extra API call to generate the summary, then use the summary instead of the raw content.

Strategy 3 — Switch models. If your input is 150K tokens, GPT-4o (128K) won't fit. Switch to Claude (200K) or Gemini (1M+).

Strategy 4 — Chunk and process separately. Split the input into manageable pieces, process each, then combine results. Most common for very long documents.

Strategy 5 — Use embeddings + retrieval. Instead of sending all context, embed it once, then retrieve the most relevant chunks per query. Reduces input from 200K tokens to 5K tokens for most queries.

The Token Budget Mental Model

Think of every API call as a budget. The budget is your context window. You spend tokens on:

Add them up before each call. Make sure they fit. If they don't, drop the lowest-priority item or switch models.

The 30-Second Pre-Send Habit

Before you send any prompt:

  1. Estimate your input size in your head (200 words ≈ 260 tokens)
  2. Add system prompt (usually 200-500 tokens)
  3. Add retrieved context if RAG (usually 1K-5K tokens)
  4. Add chat history if applicable (~300 tokens per turn)
  5. Compare to your model's safe input limit
  6. If within limit, send. If not, truncate or switch.

Or just paste it all into the Token Counter and skip the math.

Count tokens for any prompt before sending. Free, instant.

Open Token Counter →
Launch Your Own Clothing Brand — No Inventory, No Risk