What Is Robots.txt?
Robots.txt is a text file at the root of your website (e.g. yoursite.com/robots.txt) that tells search engine crawlers which pages they are allowed or not allowed to access. It is part of the Robots Exclusion Protocol and is one of the first files a crawler checks when visiting your site.
How Robots.txt Works
When Googlebot arrives at your site, it first requests /robots.txt. If the file exists, the crawler follows the rules inside it. If it does not exist, the crawler assumes it can access everything.
Important: robots.txt controls crawling, not indexing. A page blocked by robots.txt can still appear in search results if other pages link to it. To prevent indexing, use a noindex meta tag instead (check with robots directives checker).
Common Robots.txt Directives
| Directive | Example | Meaning |
|---|---|---|
| User-agent | User-agent: * | Applies to all crawlers |
| Disallow | Disallow: /admin/ | Do not crawl this path |
| Allow | Allow: /admin/public/ | Override a disallow for this path |
| Sitemap | Sitemap: https://site.com/sitemap.xml | Location of the sitemap |
| Crawl-delay | Crawl-delay: 10 | Wait 10 seconds between requests (not used by Google) |
Example Robots.txt
User-agent: * Allow: / Disallow: /admin/ Disallow: /checkout/ Disallow: /search? Sitemap: https://www.example.com/sitemap.xml
Common Mistakes
- Blocking CSS and JS — Google needs to render your pages, so do not block stylesheets or scripts
- Blocking your entire site —
Disallow: /blocks everything. Only use this on staging sites - Using robots.txt to hide pages — blocked pages can still be indexed. Use noindex instead
- No sitemap directive — always include your sitemap URL in robots.txt
The RankNibbler site audit checks your robots.txt for sitemap references when discovering your pages.
Last updated: March 2026