armandmcqueen.dev

LLM Benchmark Cost Calculator

Estimate what a benchmark run will cost across a set of models. Pick models, set the input/output token sizes and how many calls each model gets.

gpt-5-nanogpt-5.4-nanogroq:llama-3.1-8bmistral-small-4grok-4.1-fastmercury-2together:gemma-4-31bgemini-3.1-flash-litehaiku-4.5together:qwen3.5-9b
Total benchmark cost
$0.0132
10 calls · 10 models

Per-model breakdown

ModelCallsInput tokensOutput tokensCost
gpt-5-nano11,0001,000$0.000450
gpt-5.4-nano11,0001,000$0.001450
groq:llama-3.1-8b11,0001,000$0.000130
mistral-small-411,0001,000$0.000750
grok-4.1-fast11,0001,000$0.000700
mercury-211,0001,000$0.001000
together:gemma-4-31b11,0001,000$0.000700
gemini-3.1-flash-lite11,0001,000$0.001750
haiku-4.511,0001,000$0.006000
together:qwen3.5-9b11,0001,000$0.000250

Cost sensitivity

gpt-5-nanogpt-5.4-nanogroq:llama-3.1-8bmistral-small-4grok-4.1-fastmercury-2together:gemma-4-31bgemini-3.1-flash-litehaiku-4.5together:qwen3.5-9b
Cost vs input tokens
Output held at 1,000 · 1 call/model
Cost vs output tokens
Input held at 1,000 · 1 call/model