Embedding Cost Calculator

Name: Embedding Cost Calculator
Author: Andrew Laws

Work out what it actually costs to turn a pile of documents into vectors. Pick a model (or drop in a custom price), tell it how big the corpus is, and decide whether you're embedding once or refreshing on a schedule. The calculator returns the one-off bill, the monthly run-rate, and a total over the period you care about. USD, list price, browser-only.

Explain like I'm 5 (what even is this calculator?)

An embedding is a number-version of a chunk of text. You pay a model to make one for every chunk of your documents, and then you keep them in a database so you can find similar chunks later. This calculator works out the embedding bill. It does not do the database bill. That's a separate problem.

Calculate

Corpus

Number of documents

Average words per document

Words-to-tokens multiplier

1.3 is a fair default for English prose. Code, JSON or non-English text usually run higher.

Model

Embedding model

Refresh schedule

Refresh frequency

Period (months)

Press Calculate to see the bill.

Prove it

Total tokens equals documents × average words × words-to-tokens multiplier. The one-off cost is total tokens divided by 1,000,000, multiplied by the model's price per million. Monthly cost is the same per-refresh figure multiplied by refreshes per month (one for monthly, ~4.345 for weekly, ~30.4375 for daily). Total cost over the period is one-off plus monthly times months. List prices only.

Useful? Save this calculator: press Ctrl + D to bookmark it.

What you're actually paying for

An embedding model takes text in and gives you a list of numbers out. That list is a vector, and its length is the model's dimension count: 1536 for OpenAI's small model, 3072 for the large one, 1024 for Cohere and Voyage. Each chunk of your corpus gets its own vector. You pay by the token. Output dimensions don't change the bill, only the price per million input tokens does.

The reason embedding bills look small next to LLM inference bills is that you only do it once per chunk per refresh. A million-token corpus on text-embedding-3-small is twenty cents. The same workload on text-embedding-3-large is $1.30. Annoying, sure, but rarely the line on the invoice that keeps the finance team awake. The LLM bill, the one that scales with queries, is almost always the bigger fish.

Picking a model without overthinking it

Default to text-embedding-3-small. It's cheap, fast, and good enough for the vast majority of retrieval workloads. If you measure your retrieval quality (recall@k, MRR, that sort of thing) and find the small model misses on hard queries that you can demonstrate matter to your users, then upgrading to large or to a Voyage or Cohere model is a real decision worth running an A/B on. If you haven't measured, you're guessing.

The custom-price option exists for two cases: you've negotiated a volume rate that beats the published price, or you're running an open-source embedder yourself (BGE, GTE, the Sentence Transformers family) and want to plug in your own per-token cost based on GPU time.

How big is "big" for an embedding workload?

A few rough markers. A typical company knowledge base sits between 50,000 and 5,000,000 tokens, which is pennies to single-digit pounds on any of the cheap models. A medium-sized public help centre lands around 10 to 50 million tokens, still under twenty quid on the small model. Where it starts to bite is corpora in the hundreds of millions of tokens, or workloads where you re-embed the whole corpus weekly or daily because the source data is volatile.

If you're staring at a daily-refresh number that looks alarming, the right move is almost always incremental embedding: only re-embed the chunks that actually changed. Most pipelines do this badly or not at all, and it's the single biggest lever on a recurring embedding bill.

Things this calculator deliberately doesn't do

Storage cost. Pinecone, Weaviate, Qdrant, or pgvector all want a monthly figure for keeping the vectors hot and queryable. That's a separate line, and the RAG Pipeline Cost Calculator covers it.
Re-embedding only what changed. The calculator assumes a worst-case full-corpus refresh on whatever schedule you pick. Your real bill, with smart incremental updates, will usually be a small fraction of the figure shown.
Batch discounts. List prices, not the 50% off you sometimes get on async batch APIs. If you're batching, halve the figure here.
Quality. A cheap embedding model that retrieves the wrong chunks is more expensive than an expensive one that retrieves the right ones, because every retrieval failure costs you an LLM call and an unhappy user. The bill is one input to the decision, not the whole decision.

Related calculators

Embedding cost is one slice of a RAG bill. These cover the rest.

Frequently asked questions

What is an embedding?

An embedding is a list of numbers that captures the meaning of a chunk of text. Two passages about the same topic land near each other in that number-space, which is what makes vector search work. You pay an embedding model to turn each chunk of your corpus into one of these vectors.

Why 1.3 tokens per word?

Roughly the ratio you get on average English prose with the OpenAI tokeniser. Code, JSON, and non-English text run higher, often closer to 1.5 or 2. If your corpus is mostly schemas or French legal prose, bump the multiplier or use a tokeniser to count properly.

When does refreshing the embeddings actually matter?

Only when the source documents change. If you're indexing a stable knowledge base, embed once and forget. If your corpus updates, you need a refresh strategy. In practice, you'll usually only re-embed the chunks that changed, which costs a fraction of the figure here.

Why is text-embedding-3-large six times the price?

Higher dimension count (3072 vs 1536) and a heavier model produce sharper retrieval, especially on hard queries where the small model gets confused. Worth it if retrieval quality is the bottleneck.

Does this include the storage cost in a vector database?

No. This is just the embedding bill. The monthly bill from Pinecone, Weaviate, Qdrant, or self-hosted pgvector is a separate line, covered by the RAG Pipeline Cost Calculator.

Does this calculator send my numbers anywhere?

No. Everything runs in your browser. Nothing is uploaded.

Last updated 29 April 2026 by Andrew Laws.