LLM API pricing changes every quarter. The "cheap" model from six months ago might not even be the cheapest anymore. Here is the real ranking as of April 2026, ordered by per-million-token cost across the providers most teams actually consider.
Input tokens are what you send (the prompt). Output tokens are what the model generates (the response). Most workloads use more input than output, but the ratio depends on your use case.
| Model | Provider | Input ($/M) | Output ($/M) | Tier |
|---|---|---|---|---|
| Gemini 2.0 Flash | $0.10 | $0.40 | Cheap | |
| GPT-4.1 nano | OpenAI | $0.10 | $0.40 | Cheap |
| Mistral Small | Mistral | $0.10 | $0.30 | Cheap |
| Gemini 2.5 Flash | $0.15 | $0.60 | Cheap | |
| GPT-4o mini | OpenAI | $0.15 | $0.60 | Cheap |
| Llama 4 Scout | Meta (hosted) | $0.18 | $0.59 | Cheap |
| DeepSeek V3 | DeepSeek | $0.27 | $1.10 | Cheap |
| Llama 4 Maverick | Meta (hosted) | $0.27 | $0.85 | Cheap |
| Grok 3 mini | xAI | $0.30 | $0.50 | Cheap |
| GPT-4.1 mini | OpenAI | $0.40 | $1.60 | Mid |
| DeepSeek R1 | DeepSeek | $0.55 | $2.19 | Mid |
| Claude Haiku 3.5 | Anthropic | $0.80 | $4.00 | Mid |
| o3 mini | OpenAI | $1.10 | $4.40 | Mid |
| o4 mini | OpenAI | $1.10 | $4.40 | Mid |
| Gemini 2.5 Pro | $1.25 | $10.00 | Mid | |
| GPT-4.1 | OpenAI | $2.00 | $8.00 | Premium |
| Mistral Large | Mistral | $2.00 | $6.00 | Premium |
| GPT-4o | OpenAI | $2.50 | $10.00 | Premium |
| Claude Sonnet 4 | Anthropic | $3.00 | $15.00 | Premium |
| Grok 3 | xAI | $3.00 | $15.00 | Premium |
| o3 | OpenAI | $10.00 | $40.00 | Top |
| Claude Opus 4 | Anthropic | $15.00 | $75.00 | Top |
The cheap tier costs roughly 1/100th of the top tier on input, and 1/180th on output. That gap is bigger than most people realize until they see the monthly bill.
Plug in your real usage and get a side-by-side bill for every model.
Open AI Cost Calculator →Assume 1,000 input tokens and 300 output tokens per request — a typical chatbot exchange. That is 3,000 requests per month, 3 million input tokens, and 900,000 output tokens.
| Model | Monthly cost | Equivalent |
|---|---|---|
| Gemini 2.0 Flash | $0.66 | Cheaper than coffee |
| GPT-4.1 nano | $0.66 | Cheaper than coffee |
| GPT-4o mini | $0.99 | Less than $1/mo |
| Gemini 2.5 Flash | $0.99 | Less than $1/mo |
| DeepSeek V3 | $1.80 | Half a coffee |
| Claude Haiku 3.5 | $5.40 | One Netflix month |
| Gemini 2.5 Pro | $12.75 | One Spotify month |
| GPT-4o | $10.20 | One Spotify month |
| Claude Sonnet 4 | $22.50 | One ChatGPT Plus |
| GPT-4.1 | $13.20 | One Spotify month |
| o3 | $66.00 | One steak dinner |
| Claude Opus 4 | $112.50 | One nice dinner |
For a personal project or low-traffic side app, you can run on a flagship model for under $25/month. For a production app with thousands of daily requests, the cheap tier becomes essential.
A cheap model that generates verbose, hedge-filled responses can use 2-3x more output tokens than a premium model that gives a tight answer. Ten extra tokens per response, multiplied by millions of requests, can erase the per-token discount.
Run the same prompt through both. Compare token counts and response quality. Sometimes the "expensive" model is actually cheaper end to end because it answers in 50 tokens instead of 200.
Models charge more for output than input — usually 3-5x more. If your workload is output-heavy (content generation, code generation, long responses), the output price matters most. If your workload is input-heavy (RAG, document Q&A, summarization), the input price dominates.
The headline price is rarely what you actually pay. Most providers offer:
If you cache aggressively and run batch jobs, your effective price can be 30-60% below the published rate.
Get the real monthly bill for your workload — across every model.
Open AI Cost Calculator →For most workloads in 2026, the answer is: start on Gemini 2.0 Flash or GPT-4o mini. They are cheap enough that cost is not a constraint. Upgrade only if quality forces you to. Use the cost calculator to see what your actual bill would look like before you commit.