Robots.txt Tester

Name: Robots.txt Tester
Author: Andrew Laws

Paste a robots.txt body, drop in a URL or path, pick a user-agent, and see whether it is allowed or blocked. The decision follows Google's spec: most-specific user-agent group wins, then the longest matching path rule, with Allow beating Disallow on equal length. Wildcards and end anchors supported. Runs entirely in your browser.

Explain like I'm 5 (what even is this calculator?)

A robots.txt is a small text file at the root of a website that tells crawlers like Googlebot which paths they can fetch and which they should leave alone. The rules look simple, but the order, the wildcards, and the longest-match tie-break catch people out. This page lets you paste your file, test a URL, and see exactly which line in your robots.txt made the call.

Test a URL

Browser-only. Nothing is fetched, nothing is uploaded. The robots.txt body and URLs you paste stay on your device.

Your robots.txt

Paste the file body. Comments (#) and blank lines are ignored. Malformed lines are skipped, not crashed.

Mode Single URL Multiple URLs

URL or path to test

Full URLs have the host stripped before matching. Paths without a leading slash get one added.

User-agent

Prove it

Run a test first and the working appears here: which user-agent group matched, every Allow and Disallow rule that was considered, the length each one would score, and the rule that won. Same logic Google publishes for its own crawler.

Useful? Save this calculator: press Ctrl + D to bookmark it.

What robots.txt actually controls

A robots.txt sits at the root of a domain (https://example.com/robots.txt) and tells well-behaved crawlers which URL paths they may fetch. It is a request, not a fence. Major search engines respect it. Most spammers, scrapers and AI training bots ignore it unless their operator has decided otherwise. If you need to genuinely keep something off the internet, use access control on the server, not a robots.txt entry.

The file is also not a noindex tool. Disallow stops a crawler fetching the URL, which means Google may still index the URL based on links and anchor text, just without a snippet. If you want a page out of the index, the right tool is a meta robots noindex tag (or an X-Robots-Tag header) on the page itself, and the page must be crawlable for Google to see the noindex in the first place. Disallowing it instead is the classic mistake that leaves a page in the index forever.

How the matching actually works

Google's spec, which most search engines now follow, picks the user-agent group by most specific match. If your file has a Googlebot block and a * block, Googlebot uses the Googlebot block exclusively. It does not fall back to * for rules that are missing from the specific block. That is the single most common misunderstanding.

Within the chosen group, every Allow and Disallow is tested against the URL path. The rule with the longest matching pattern wins. On a tie, Allow beats Disallow. Wildcards (*) match any sequence and the dollar sign anchors the end of the URL. Patterns are matched against the path and query string only, the host has no effect on the decision.

Common mistakes the tester catches

The first is mixing case. Paths are case-sensitive, so Disallow: /Admin/ does nothing on a site whose admin path is /admin/. The second is forgetting the longer-Allow override: Disallow: /search/ will not block /search/index.html if you also have Allow: /search/index. The third is putting rules above the first User-agent line, which makes them belong to no group, so they are silently ignored. The fourth is using regular expressions: robots.txt does not support them, only * and $.

When to test before you ship

Anything you change about a robots.txt is a change in what Google can fetch. The safe sequence is: edit locally, paste the new body in here with a representative list of URLs (a few from each section of the site), check the verdicts, then deploy. If you maintain different rules for different bots, run the same URL list through each user-agent. The most painful robots.txt regressions are the ones that were technically valid and quietly stopped a section being crawled for weeks.

Related calculators

Crawl rules are one half of indexing. These check the other half.

Frequently asked questions

Does this tool fetch my live robots.txt?

No. You paste the body of the robots.txt yourself and the matching runs entirely in your browser. There is no network call. That keeps unpublished or staging robots files private, and means you can test changes before you deploy them rather than after.

Why does my Disallow not block the URL I expected?

The most common reason is a longer Allow rule overriding it. Google picks the rule with the longest matching pattern, so Disallow: /a/ loses to Allow: /a/b/ on URLs under /a/b/. The other common cause is case: paths are case-sensitive, so Disallow: /Private/ does not block /private/. Open the prove-it panel below the result to see every candidate rule and which length won.

How does the user-agent matching work?

Group selection uses the most specific token whose name is a prefix of the requested user-agent, case-insensitively. So Googlebot-Image gets the Googlebot-Image group if one exists, falling back to the Googlebot group, falling back to the * group. If none match, everything is allowed by default.

What do * and $ mean in robots.txt patterns?

An asterisk matches any sequence of characters, including empty. A dollar sign at the end of a pattern anchors it to the end of the URL, so Disallow: /*.pdf$ blocks paths ending in .pdf but not /file.pdf.txt. Both are part of the original Google spec and are widely supported.

Why is an empty Disallow line treated as allow?

It is the convention from the original 1994 robots.txt note: a line of Disallow with nothing after the colon means there is nothing to disallow, so everything is allowed for that user-agent. Google still treats it that way. An empty Allow line is a no-op and does not contribute a matching rule.

Will blocking a page in robots.txt remove it from Google?

No, and this is a frequent SEO mistake. Disallow stops the crawler fetching the URL. The URL itself can still appear in the index (without a snippet) based on links pointing at it. To actually de-index a page, leave it crawlable and serve a noindex meta tag or X-Robots-Tag header. Disallowing a page Google has already indexed can leave it in the index for a long time.

Last updated 29 April 2026 by Andrew Laws.