Robots.txt Generator

Configure your robots.txt rules below and download the file.

Rules
Generated robots.txt

What Is Robots.txt?

Robots.txt is a plain-text file placed at the root of your domain (https://example.com/robots.txt) that tells search engine crawlers which pages they can access and which they should ignore. It is one of the oldest web standards, dating to 1994, and every major search engine respects it. The file uses a simple syntax: a user-agent line identifying the crawler, followed by Allow/Disallow rules, and optional Sitemap and Crawl-delay directives.

Robots.txt is not a security mechanism — pages it disallows can still be accessed by typing the URL directly. What it does is signal to well-behaved crawlers that they should not waste crawl budget on the listed paths. For genuinely sensitive content, use authentication or noindex meta tags instead.

Robots.txt Directives Explained

DirectivePurposeExample
User-agentIdentifies which crawler the rules apply toUser-agent: Googlebot
DisallowBlocks crawler from the given pathDisallow: /admin/
AllowExplicitly permits a path (overrides broader Disallow)Allow: /admin/public/
SitemapTells crawlers where to find your XML sitemapSitemap: https://example.com/sitemap.xml
Crawl-delayRequests delay between requests (Bing, Yandex — not Google)Crawl-delay: 10

The asterisk (*) is a wildcard matching all user-agents. The dollar sign ($) anchors a pattern to the end of a URL. These are the only two wildcards most crawlers support.

What to Disallow (and Why)

Admin and Login Paths

/admin/, /wp-admin/, /login/, /dashboard/ — these are internal tools, not content. Blocking them saves crawl budget and prevents admin pages from appearing in search results.

Checkout and Account Paths

/checkout/, /cart/, /my-account/ — personalised pages that should not be indexed. Also prevents Google from attempting to crawl them with empty session state.

Internal Search Results

/search?, /?s= — every query creates a unique URL. Letting crawlers index them wastes budget on low-value pages and creates duplicate-content issues.

Filter and Parameter URLs

/*?color=, /*?sort= — faceted navigation creates millions of parameter URLs. Block them in robots.txt to focus Google on canonical product pages.

Dev and Staging Environments

If staging is on a subdomain (staging.example.com), block the entire subdomain with Disallow: /. This is critical — staging indexed alongside production creates duplicate-content nightmares.

Common Robots.txt Mistakes

Blocking CSS and JavaScript

Old SEO advice said to block /css/, /js/, and /images/. This is now wrong. Google needs to render your page (which requires loading CSS and JS) to evaluate mobile-friendliness and layout. Blocking these resources tanks rankings. Ensure .css, .js, and .png/.jpg files are accessible.

Accidentally Blocking the Whole Site

User-agent: *
Disallow: /

This blocks every page on the site from every crawler. A surprisingly common mistake during site migrations — a staging robots.txt gets deployed to production. Always review robots.txt immediately after any deployment.

Using Robots.txt to Hide Sensitive Data

Robots.txt is publicly visible. Everyone can read it. Listing Disallow: /secret-page/ advertises the existence of that page to anyone curious. For sensitive content, use authentication, noindex meta tags, or remove the content entirely.

Missing Sitemap Reference

Every robots.txt should include a Sitemap: directive. This is how crawlers discover your sitemap when they first crawl your site. Missing sitemap = slower discovery of new pages.

Inconsistent Slashes

Disallow: /admin matches /admin, /admin-panel, /adminUsers. Disallow: /admin/ only matches paths starting with /admin/. Be precise.

How to Test Your Robots.txt

After generating your file, test it before deploying:

  1. Download the generated file and save as robots.txt at the root of your domain.
  2. Verify it loads at yourdomain.com/robots.txt. If it 404s, you uploaded it to the wrong location.
  3. Use Google Search Console's robots.txt tester. Tests specific URLs against your live robots.txt.
  4. Manually verify key pages. Check that your homepage, key category pages, and important blog posts are allowed.
  5. Submit your sitemap. After robots.txt references the sitemap, submit it in Search Console.

Robots.txt vs Meta Robots vs X-Robots-Tag

MethodScopeBest For
robots.txtCrawling (access)Blocking crawl of entire sections
<meta name="robots" noindex>Indexing (inclusion)Allowing crawl but blocking search result inclusion
X-Robots-Tag HTTP headerIndexing for non-HTMLPDFs, images, non-HTML files

Key rule: do not use both Disallow in robots.txt AND noindex on the same page. If Google cannot crawl the page (due to Disallow), it cannot see the noindex tag either — and the URL may still show up in search results based on external links.

Related Crawl & Indexing Tools