LLM Token Usage Calculator

Name: LLM Token Usage Calculator
Author: Andrew Laws

Work out what an LLM workload actually costs before you commit to it. Paste a sample prompt or type the token counts straight in, set how often you'll call the model, and see the same workload priced across Claude, GPT, Gemini and DeepSeek side by side. All in your browser, all in USD (which is how every API bills, regardless of where you live).

Explain like I'm 5 (what even is this calculator?)

Big language models charge you for the words you send them and the words they send back. The unit is a token, which is roughly four English characters. This tool turns "I want to ask Claude 500 questions a day" into a number on a credit card statement, and shows what the same workload would cost on the other big AI providers so you can pick the cheapest one that does the job.

Calculate

Input length

Sample prompt (we'll estimate the tokens)

Expected output tokens per call

How often will you call it?

Number of calls

Primary model (the one you'd pick first)

Enter your numbers, then press Calculate.

Your primary model

Estimated input tokens–
Estimated output tokens–
Per-call cost–
Daily cost–
Monthly cost–
Annual cost–

Same workload, every model

Pricing per 1M tokens, USD. Last verified: —.

Model	Input $/1M	Output $/1M	Per call	Monthly	Annual

Prove it

Token estimation uses the four-characters-per-token rule of thumb (the canonical Anthropic and OpenAI shorthand for English prose). Real tokenisers will give a slightly different count, usually within 15%. Cost is (tokens ÷ 1,000,000) × price-per-million, summed across input and output. Monthly uses 30.4375 days (365.25 ÷ 12) so daily, monthly and annual figures stay consistent. Pricing is the standard non-cached, non-batch list price published by each provider.

Useful? Save this calculator: press Ctrl + D to bookmark it.

Price history

Each time the prices on this page get re-verified, a snapshot lands in llm-pricing-history.json. The chart shows how the input price per 1M tokens has moved over time. Output prices follow the same shape and are listed in the table below the chart for each snapshot. The first run only has one data point, so the chart fills out as more verifications are recorded.

Input price Output price

Loading price history…

Show raw price history (table)

Date	Model	Vendor	Input $/1M	Output $/1M

What this calculator is actually doing

An LLM API bills you per token. A token is roughly four characters of English, give or take, depending on the tokeniser. You pay one rate for the tokens you send (the prompt), and a higher rate for the tokens the model generates back. Multiply those two numbers by your call volume and you get a monthly bill. That's it. The maths is not difficult; the difficult bit is getting honest figures for the inputs, and noticing when one provider quietly changes a price.

The token estimator here uses the four-chars-per-token approximation rather than running a full tokeniser in your browser. The proper tokeniser bundles (tiktoken for OpenAI, Anthropic's tokeniser, Gemini's count-tokens endpoint) are either heavy enough to noticeably slow the page, locked behind an API call, or both. The four-chars-per-token shorthand is the same one Anthropic and OpenAI use in their own back-of-the-envelope docs, and it lands inside about 15% of the true count for normal prose. If you need the exact figure, every provider has a free tokeniser tool on their site or a count-tokens endpoint.

Honest caveats, in plain English

A few things this calculator does not model, all of which can move the real bill in either direction.

Tokenisation differs by model family

GPT, Claude and Gemini use different tokenisers. The same English paragraph might be 100 tokens for one and 110 for another. For code, JSON or Chinese, the gap can be wider. Call this calculator's number a sensible estimate, not an exact one.

Cached and batch pricing not modelled

Anthropic, OpenAI and Google all offer prompt caching, which knocks 50 to 90 percent off repeated input tokens (typically your system prompt and any large reference document). They also offer batch APIs that take roughly half off the list price in exchange for processing the request within 24 hours. If your workload is heavy on stable system prompts, your real bill will be markedly lower than the figure here.

Image, audio and vision tokens excluded

Sending an image or an audio clip costs tokens too, but the conversion rules are model-specific and ugly. Vision-enabled requests on GPT-4o, Claude and Gemini all have their own image-token-per-megapixel rules, and the audio models (Whisper, Gemini audio) bill per second rather than per token. None of that is modelled here. If your workload is multimodal, treat this number as the text-only floor.

Fine-tuning not included

Fine-tuned models cost more per token to call, and there is a one-off training fee on top. If you are using a fine-tuned model in production, look up the inference rate for your specific tuned model and substitute it for the relevant base price.

How to read the comparison table

The table sorts cheapest first. That's a useful starting point, but it is not the whole story. The cheap models are cheap because they are smaller; on a hard task they may need more retries, longer chain-of-thought, or hand-holding from a cleverer model in a fallback chain. Run a representative test on each candidate before committing your spending to the cheapest row.

If two providers are within a few percent of each other on price, your tiebreaker is usually quality, latency, or which one your team already has accounts and observability set up for. The provider switching cost is real, and almost never worth it for a small price delta.

Trimming an LLM bill, in rough order of impact

If the number this calculator just produced gave you a fright, the highest-leverage moves are usually:

Cap output tokens. Set max_tokens. Models will happily ramble at 75 dollars per million if you let them.
Move the workhorse to a smaller model. Most tasks do not need the flagship. Sonnet, Haiku, GPT-5.4 mini, Gemini Flash and DeepSeek V4-Flash will all handle classification, extraction and short summarisation for a fraction of the flagship cost.
Turn on prompt caching. If your system prompt is more than a few hundred tokens and you call the API more than once, caching is free money.
Batch the non-urgent stuff. Anything that doesn't need to come back inside ten seconds can run on the batch API at half price.
Tighten the prompt. Long, hand-crafted system prompts with twelve few-shot examples are reassuring to write but expensive to run a million times. A short prompt that's been tested properly usually beats a long one.

Related calculators

Other tools for the running-a-business numbers around AI work.

Frequently asked questions

Why is my actual bill different from this estimate?

Three reasons usually. The four-chars-per-token approximation is close but not exact (about ±15% for English prose, more for code or non-Latin scripts). Real workloads include retries, system prompts, conversation history and the occasional runaway response. And this tool does not model prompt caching, batch discounts, image or audio tokens, or vision add-ons. Treat the figure as a first-pass estimate, not an invoice.

What is a token?

A token is the unit a language model bills on. Roughly four characters of English text, or about three quarters of a word. Different model families use different tokenisers, so the exact count for the same prompt varies between providers. For the precise number, paste the text into the provider's official tokeniser tool or call their count-tokens endpoint.

Does this include caching or batch discounts?

No. The figures are standard, non-cached, non-batch list prices. Prompt caching can knock 50 to 90 percent off repeated input tokens, and batch APIs take another 50 percent off in exchange for slower turnaround. High-volume workloads with stable system prompts will pay less than this calculator suggests.

How often is the pricing updated?

The Last verified date on the page tells you when the figures were last checked against the providers' official pricing pages. Pricing is baked into the page so it loads fast and works without any API calls, but that means it can drift if a provider drops their prices and we have not yet pushed the update. Sanity-check against the provider's own pricing page before signing off a budget.

What about embedding models?

Embedding models are priced separately, with input-only billing and rates two or three orders of magnitude lower than chat models. This calculator focuses on chat and completion models, which is where the real spend sits. For an embedding cost, multiply your total document tokens by the provider's embedding rate (often a few cents per million).

Does this calculator send my prompts anywhere?

No. Everything runs in your browser. The text you paste never leaves your device.

Last updated 29 April 2026 by Andrew Laws.