Keyword Density Calculator

Name: Keyword Density Calculator
Author: Andrew Laws

Paste a draft, a published page, or a competitor's article. The tool returns total and unique word counts, and three density tables: single words, two-word phrases, and three-word phrases. Useful for spotting accidental keyword stuffing and for checking that a page actually covers the topic you think it does.

Explain like I'm 5 (what even is this calculator?)

It counts how many times each word and phrase appears in your text and turns the count into a percentage. The bigger the percentage, the more dominant that word or phrase is. If "running shoes" comes out at 6%, the page is shouting; if it does not appear at all on a page about running shoes, the page is not actually on topic.

Paste your text

Browser-only. Tokenisation and counting run locally. Nothing leaves the device.

Results

Counts

Total words0
Unique words0

Top single words (1-grams)

Word	Count	Density
No data yet.

Top two-word phrases (2-grams)

Phrase	Count	Density
No data yet.

Top three-word phrases (3-grams)

Phrase	Count	Density
No data yet.

Prove it

Tokenisation: lowercase the text, normalise smart quotes to ASCII apostrophes, treat em and en dashes as separators, then split on anything that is not a letter, digit, internal hyphen or internal apostrophe. Density is count divided by total tokens (before stop-word filtering), multiplied by 100. With the stop-word filter on, any unigram, bigram or trigram that contains a stop word is excluded from the table, but the total token count it is divided by stays the same.

Useful? Save this calculator: press Ctrl + D to bookmark it.

What keyword density actually tells you

Keyword density is the share of a page's words that match a particular term or phrase. It is a counting exercise, not a ranking signal. Search engines stopped reading raw density as a quality measure long before TF-IDF, BERT and embedding-based retrieval rebuilt the way pages are evaluated. So why bother running the numbers? Because density is a useful diagnostic, even when it is not a target. Two patterns are worth catching: a primary topic that hardly appears on the page (the page is not really about what you think it is about), and a phrase that appears so often the prose reads as stuffed. The first is more common than people think. The second is rarer than it used to be, but still trips up content briefs that demand a keyword be repeated a fixed number of times.

The bigram and trigram tables matter more than the unigram one

Single-word counts are dominated by stop words unless you filter them out, and even with the filter on a single word like "marketing" or "London" tells you very little about the page's actual angle. Two-word and three-word phrases are where the topic shows itself. A page on rental yields should have "rental yield", "gross yield" and "net yield" in the bigram table. A page about UK mortgages should produce "stamp duty", "fixed rate" and "monthly payment" near the top of the bigram and trigram tables. If those phrases do not appear, the page is missing the vocabulary readers and crawlers expect.

Common mistakes the tool will catch

Repeating an exact match phrase verbatim across an article. The trigram table will surface it. Replace some occurrences with a synonym or rewrite the sentence. Forgetting the topic entirely after the first paragraph: the bigram and trigram tables go quiet on the topic phrase from row two onwards. A heading-heavy page where the body never picks up the same phrasing as the H2s: the unigrams from the headings dominate, the body's bigrams disagree.

What the stop-word filter does, and when to switch it off

The filter removes around 175 of the most common English words: articles, conjunctions, prepositions, pronouns, common contractions. With the filter on, those words never appear in the unigram table, and any bigram or trigram that contains one of them is dropped from the phrase tables too. That is what you want most of the time. Switch the filter off when you are checking a specific phrase that genuinely needs a stop word in it, like "cat in the hat" or "of the year". Switch it off too when you are auditing a page for natural-sounding prose: a healthy article has a high proportion of stop words, and seeing them in the raw counts can reassure you that the prose is not over-engineered.

Edge cases the tokeniser handles

Hyphenated words like "long-tail" stay as one token. Contractions like "don't" and "it's" stay as one token, with the apostrophe preserved. Numbers are kept and counted. Em dashes and en dashes are treated as separators rather than parts of words, so "test, really" and "test really" tokenise the same way. Smart quotes from Word and Google Docs are normalised to ASCII apostrophes before counting, so the same word does not appear twice with a different quote character.

Related calculators

Density is one signal. These cover the rest of on-page hygiene.

Frequently asked questions

What is a healthy keyword density?

There is no fixed target. Modern search engines do not score pages on a density ratio, and chasing a specific percentage usually produces stilted prose. Use the tables to spot a primary term that barely appears, or a phrase that has been repeated so often it reads as stuffed. A primary keyword sitting between 0.5% and 2.5% in natural prose is normal. Above 4% on a single term is worth a second look.

Does keyword density still matter for SEO in 2026?

Density itself is not a ranking factor. What still matters is whether the page actually covers the topic with the right vocabulary. The bigram and trigram tables are more useful than the single-word table for that, because they surface the phrases a page is genuinely about.

Why does the tool refuse to compute on short text?

Below 50 words, density percentages are noise. A single repeated word in a 30-word snippet looks like 10% density and tells you nothing about a real page. The threshold is set at 50 words so the percentages are at least directionally meaningful.

Does it send my text anywhere?

No. Tokenisation, counting and ranking all happen in your browser. Disconnect the network and the tool still works. Unpublished drafts and client copy stay on your device.

Last updated 29 April 2026 by Andrew Laws.