How much does an AI chatbot cost per conversation?

A typical 5-message support conversation uses about 3,000 input tokens (with system prompt and history) and 1,200 output tokens. Cost per conversation: $0.0008 on GPT-4o mini, $0.018 on GPT-4o, $0.024 on Claude Sonnet 4. Cheap models can run a chatbot for under $0.001 per conversation.

What is the cheapest LLM for customer support?

Gemini 2.0 Flash and GPT-4o mini are the cheapest options that handle support quality reasonably well. For $0.001 or less per conversation, both work for routine tier-1 questions. Escalate only the hard ones to a more expensive model.

How many conversations does $100 buy on GPT-4o?

At about $0.018 per conversation on GPT-4o, $100 buys roughly 5,500 customer support conversations. On GPT-4o mini, the same $100 buys 125,000 conversations. The 23x cost gap is the difference between a $100/month bill and a $2,300/month bill at the same volume.

Should I use one model or escalate?

For high-volume support, use a cheap model (Gemini Flash, GPT-4o mini) as the default and escalate only when (1) the bot is unsure, (2) the customer asks for a human, or (3) the question involves complex reasoning. Escalation logic typically routes 5-15% of conversations to the premium model and saves 60-80% of total cost.

Customer Support Chatbot LLM Cost — Real Breakdown for 2026

Last updated: April 20267 min readAI Tools

Customer support chatbots are one of the most common LLM use cases — and one of the most expensive when you pick the wrong model. Here is the real cost breakdown for running a support bot in 2026, with per-conversation, per-customer, and per-month numbers across every major model.

The Real Conversation Shape

A typical support conversation looks like this:

System prompt (fixed): 400 tokens — persona, tone, escalation rules, product knowledge summary
Conversation history (grows each turn): ~300 tokens per exchange
RAG context (per turn): ~800 tokens of retrieved knowledge base content
User message: ~80 tokens
Bot response: ~250 tokens

For a 5-message exchange (which is the median length), total token usage:

Turn	Input tokens	Output tokens
1	1,280 (system + RAG + user)	250
2	1,830 (+ history)	250
3	2,380 (+ history)	250
4	2,930 (+ history)	250
5	3,480 (+ history)	250

Total per conversation: ~11,900 input tokens, 1,250 output tokens.

Plug your real conversation shape into the calculator.

Open AI Cost Calculator →

Per-Conversation Cost by Model

Model	Per conversation	Per 1,000 conversations
Gemini 2.0 Flash	$0.00169	$1.69
GPT-4o mini	$0.00254	$2.54
Claude Haiku 3.5	$0.01452	$14.52
GPT-4o	$0.04225	$42.25
Claude Sonnet 4	$0.05445	$54.45
Claude Opus 4	$0.27225	$272.25

The cheap tier costs about 1/100th of Claude Opus 4. For high-volume support, the choice is obvious — but quality matters too.

Real Monthly Cost at Common Volumes

Volume	GPT-4o mini	GPT-4o	Claude Sonnet 4
100 conversations/day (3K/mo)	$7.62	$126.75	$163.35
500/day (15K/mo)	$38.10	$633.75	$816.75
2,000/day (60K/mo)	$152.40	$2,535.00	$3,267.00
10,000/day (300K/mo)	$762.00	$12,675.00	$16,335.00

At 10,000 daily conversations, switching from GPT-4o to GPT-4o mini saves $11,913/month — enough to fund another engineer. The quality loss for typical tier-1 support is usually negligible.

How Much to Charge Per Customer

If you sell support automation to other companies, your cost per active customer depends on conversation volume per customer per month:

Customer profile	Conversations/mo	Cost on GPT-4o mini	Cost on GPT-4o
Light (small business)	5	$0.013	$0.21
Standard (mid-market)	25	$0.064	$1.06
Heavy (enterprise dept)	150	$0.38	$6.34
Whale (high-volume support)	800	$2.03	$33.80

For a $99/month per-customer SaaS, the GPT-4o mini cost is invisible. The GPT-4o cost is also fine for most customers — $33.80 on a whale is still a tiny fraction of $99. On Claude Opus, the math breaks: a single whale would cost $169/month to serve.

The Escalation Pattern That Saves Money Without Hurting Quality

Default model: GPT-4o mini or Gemini 2.0 Flash
Confidence check: Ask the bot "are you confident in this answer?" or measure response length / structure
Escalation rules: Route to GPT-4o or Claude Sonnet 4 if (a) confidence is low, (b) user explicitly asks for help, (c) the question contains words like "billing," "cancel," "refund," "complaint"
Final tier: Hand off to a human if the bot escalates twice without resolving

In production, this pattern typically results in:

~85% of conversations stay on the cheap model
~12% escalate to the mid-tier model
~3% escalate to a human

Blended cost is ~1.5x the cheap model alone, not 10x. You get most of the quality benefit at a fraction of the all-premium cost.

Three Things That Will Blow Up Your Bill

1. Unbounded conversation history. If you never truncate, every message in a long conversation includes all prior messages. A 30-turn conversation can use 30,000+ input tokens for the last message alone. Always cap history at 10-20 messages, or summarize older turns.

2. Retrieving too many chunks. RAG systems default to 5-10 chunks per query. For chatbots, 3-5 is usually enough. Cutting chunks in half cuts input cost in half.

3. Verbose responses. Set max_tokens to 300 or 400 for chatbots. Without a cap, models will sometimes write essays. The cap makes cost predictable and responses more readable.

The Recommended Setup

For a 2026 customer support chatbot:

Default model: GPT-4o mini or Gemini 2.0 Flash
System prompt: 300-500 tokens, cached on Anthropic or via OpenAI auto-caching
RAG chunks: 3-5 per query, 400-600 tokens each
History cap: last 8 messages, summarize the rest
Max output: 300-400 tokens
Escalation: ~10% of traffic to GPT-4o or Claude Sonnet

Run the numbers in the AI Cost Calculator with the chatbot preset to see what your specific volume costs.

Project your support chatbot bill across every major model.

Open AI Cost Calculator →

Customer Support Chatbot LLM Cost — Real Breakdown for 2026

The Real Conversation Shape

Per-Conversation Cost by Model

Real Monthly Cost at Common Volumes

How Much to Charge Per Customer

The Escalation Pattern That Saves Money Without Hurting Quality

Three Things That Will Blow Up Your Bill

The Recommended Setup

Related Posts

AI Cost Calculator

SaaS LLM Pricing Guide

Cheapest LLM 2026