AI Token Limits & Costs
Context windows and per-token prices for major LLM APIs — ballpark figures for cost estimation.
Reference
Prices are USD per 1 M tokens. Check provider docs for latest — these change often.
OpenAI
| Model | Context | Input / 1M | Output / 1M |
|---|---|---|---|
| GPT-4o | 128K | $2.50 | $10.00 |
| GPT-4o mini | 128K | $0.15 | $0.60 |
| GPT-4.1 | 1M | $2.00 | $8.00 |
| GPT-4.1 mini | 1M | $0.40 | $1.60 |
| GPT-4.1 nano | 1M | $0.10 | $0.40 |
| o1 | 200K | $15.00 | $60.00 |
| o3-mini | 200K | $1.10 | $4.40 |
Anthropic
| Model | Context | Input / 1M | Output / 1M |
|---|---|---|---|
| Claude 3.5 Haiku | 200K | $0.80 | $4.00 |
| Claude 3.5 Sonnet | 200K | $3.00 | $15.00 |
| Claude 3 Opus | 200K | $15.00 | $75.00 |
| Claude 4 Sonnet | 200K | $3.00 | $15.00 |
| Claude 4 Opus | 200K | $15.00 | $75.00 |
| Model | Context | Input / 1M | Output / 1M |
|---|---|---|---|
| Gemini 2.0 Flash | 1M | $0.10 | $0.40 |
| Gemini 2.5 Flash | 1M | $0.15 | $0.60 |
| Gemini 2.5 Pro | 1M | $1.25 (≤200K) / $2.50 | $10.00 / $15.00 |
Notes
- Cached input can drop prices by 50–90% — investigate prompt caching when reusing system prompts.
- Batch pricing offers ~50% off at 24 h latency on OpenAI and Anthropic.
- 1 token ≈ 4 English characters / 0.75 word on average. Code and non-English languages use more tokens.
Last updated: