LLM Benchmark Cost Calculator
Estimate what a benchmark run will cost across a set of models. Pick models, set the input/output token sizes and how many calls each model gets.
gpt-5-nanogpt-5.4-nanogroq:llama-3.1-8bmistral-small-4grok-4.1-fastmercury-2together:gemma-4-31bgemini-3.1-flash-litehaiku-4.5together:qwen3.5-9b
Total benchmark cost
$0.0132
10 calls · 10 models
Per-model breakdown
| Model | Calls | Input tokens | Output tokens | Cost |
|---|---|---|---|---|
| gpt-5-nano | 1 | 1,000 | 1,000 | $0.000450 |
| gpt-5.4-nano | 1 | 1,000 | 1,000 | $0.001450 |
| groq:llama-3.1-8b | 1 | 1,000 | 1,000 | $0.000130 |
| mistral-small-4 | 1 | 1,000 | 1,000 | $0.000750 |
| grok-4.1-fast | 1 | 1,000 | 1,000 | $0.000700 |
| mercury-2 | 1 | 1,000 | 1,000 | $0.001000 |
| together:gemma-4-31b | 1 | 1,000 | 1,000 | $0.000700 |
| gemini-3.1-flash-lite | 1 | 1,000 | 1,000 | $0.001750 |
| haiku-4.5 | 1 | 1,000 | 1,000 | $0.006000 |
| together:qwen3.5-9b | 1 | 1,000 | 1,000 | $0.000250 |
Cost sensitivity
gpt-5-nanogpt-5.4-nanogroq:llama-3.1-8bmistral-small-4grok-4.1-fastmercury-2together:gemma-4-31bgemini-3.1-flash-litehaiku-4.5together:qwen3.5-9b
Cost vs input tokens
Output held at 1,000 · 1 call/model
Cost vs output tokens
Input held at 1,000 · 1 call/model