Most teams discover their LLM bill is 3x what they expected — after the first month. The fix is a 5-minute estimate before you write a line of code. Here is the exact method that gets you within 20% of your real bill.
Forget complicated spreadsheets. You only need three inputs to estimate any LLM workload:
Get those three numbers right, and you can predict any model's monthly bill in under a minute.
Input tokens include everything you send: system prompt, conversation history, retrieved context (for RAG), and the user's current message. The big gotcha is conversation history — chatbots send the entire history with every message, which compounds fast.
| Use case | Typical input tokens |
|---|---|
| Single-shot Q&A (no history) | 100 - 500 |
| Chatbot (5 turn history) | 800 - 2,000 |
| Chatbot (long history) | 2,000 - 6,000 |
| Document summarization (1 page) | 1,500 - 3,000 |
| Document summarization (10 pages) | 15,000 - 30,000 |
| RAG with 5 retrieved chunks | 1,500 - 4,000 |
| Code generation (with file context) | 2,000 - 8,000 |
| Long-context analysis (full doc) | 20,000 - 100,000+ |
If you do not know yet, write a sample prompt and use the token counter to measure it. Multiply by your expected history length.
Output is harder to estimate because it varies with the model, the prompt, and the task. Use these rough ranges:
| Output type | Typical output tokens |
|---|---|
| Yes/no or single-word | 5 - 20 |
| Short answer (1-2 sentences) | 30 - 80 |
| Chat response (paragraph) | 100 - 300 |
| Detailed answer | 300 - 800 |
| Article or long-form | 800 - 2,000 |
| Code block (function) | 200 - 800 |
| Code block (full file) | 800 - 3,000 |
| Structured JSON (nested) | 100 - 500 |
Tip: include "respond in under N words" or "limit response to N sentences" in your prompt to reduce output variance. This both lowers cost and tightens estimates.
Plug your three numbers in and get a side-by-side bill for every model.
Open AI Cost Calculator →Be honest. Most projects estimate 10x more traffic than they actually get in month one. Use these starting points:
Multiply users by sessions per day per active user, then by requests per session. Most chat apps see 3-10 messages per session and 1-3 sessions per active user per day.
The formula is:
Monthly cost = ((input_tokens × input_price + output_tokens × output_price) ÷ 1,000,000) × requests_per_day × 30
Example: GPT-4o ($2.50 input, $10 output) at 1,500 input tokens, 400 output tokens, 2,000 requests per day:
Same workload on GPT-4o mini: $0.000465 per request, $0.93 daily, $27.90 monthly. The cheap tier is roughly 17x less.
Your real bill will exceed your estimate. Always. Reasons:
Add 30% to your estimate. If the buffered number still fits your budget, you can build. If it doesn't, drop to a cheaper model or rework the prompt.
The AI Cost Calculator does all of this in one input box. Type your three numbers, hit calculate, and see every major model side-by-side. The 6 built-in presets cover common workloads (chatbot, summarization, code gen, RAG, batch classification) so you can start from a realistic shape and adjust.
Stop guessing your AI bill. Get exact numbers for every model in one click.
Open AI Cost Calculator →