What Are Robots Directives?

Robots directives are machine-readable instructions that tell search engine crawlers how to treat a specific page or resource. They answer two fundamental questions: should this page appear in search results, and should search engines follow the links it contains? Every page on your site receives an implicit answer to both questions, whether you set one deliberately or not. The default answer is "yes" to both, which means any page Google can crawl will be indexed and its links followed unless you explicitly say otherwise.

The term "robots directives" covers three distinct delivery mechanisms: the HTML meta robots tag, the X-Robots-Tag HTTP response header, and entries in your robots.txt file. Each mechanism works at a different level and serves different purposes. Understanding which to use — and when — is one of the most important technical SEO skills you can develop, because getting it wrong in either direction costs you either traffic (accidental noindex) or crawl budget (unnecessary crawling of junk pages).

This guide covers every major robots directive, explains how Google processes them, walks through the most common mistakes, and shows you how RankNibbler's robots directives checker surfaces problems before they cost you rankings.

The Meta Robots Tag: Syntax and All Supported Values

The robots meta tag sits inside the <head> element of an HTML page. Its basic syntax looks like this:

<meta name="robots" content="index, follow">

The name attribute can be set to robots (applies to all crawlers) or to a specific crawler such as googlebot, bingbot, or googlebot-news. The content attribute accepts a comma-separated list of directives. Values are not case-sensitive and whitespace around commas is ignored.

You can target Google specifically while leaving other crawlers on default behaviour:

<meta name="googlebot" content="noindex, follow">

index and noindex

index is the default and signals that Google may include the page in its search index. You do not need to write it explicitly, though some teams add it for clarity. noindex is the opposite: it instructs every compliant crawler not to include the page in search results. Once Google processes a noindex directive and re-crawls the page, the URL drops from the index. The drop is not instant — it depends on how frequently Googlebot revisits the URL — but it is reliable.

Use noindex on pages that have no search value: duplicate content variants, internal site-search result pages, thank-you pages, account dashboards, staging previews accidentally made public, paginated archive pages beyond a reasonable depth, and thin or auto-generated pages that dilute your site's perceived quality.

follow and nofollow

follow is the default and means Google will crawl the outgoing links on the page and pass PageRank through them. nofollow at the page level tells Google not to follow any link on that page. This is distinct from the rel="nofollow" attribute on individual anchor tags, which applies only to that specific link.

Using nofollow at the page level is relatively rare. You would use it on a page full of user-generated links you cannot individually vet, or on a gateway page that exists purely for humans and should not distribute link equity. Be aware that nofollow at the page level does not prevent Google from discovering those URLs through other means — it just means this particular page does not vouch for them.

A common and valid combination is noindex, follow. This tells Google: "Do not include this page in search results, but you may still follow the links and crawl where they lead." This is appropriate for category index pages that duplicate product listings, or for pages that serve a navigational purpose without offering unique content.

noarchive

noarchive prevents search engines from storing and displaying a cached version of your page in search results. When this directive is present, the "Cached" link that historically appeared next to search results is suppressed. Note that Google removed the cached link from search results for most regions in early 2024, so noarchive is less consequential than it once was, but it still applies to Bing and other engines that display cached copies.

<meta name="robots" content="noarchive">

nosnippet

nosnippet tells Google not to show any text snippet, video preview, or information from the page in search results. The page may still appear as a result with its title and URL, but no descriptive text will be shown. This directive can reduce click-through rate significantly, so it is normally used only when the content owner has legal or commercial reasons to suppress previews — news licensing disputes being the classic example.

max-snippet

max-snippet:[n] lets you set a specific character limit on the text snippet Google can show for your page. If you set it to zero (max-snippet:0) it is equivalent to nosnippet. Setting it to -1 tells Google there is no limit and it may use whatever snippet length it considers appropriate. A value of 160, for example, caps the snippet at roughly one line of text:

<meta name="robots" content="max-snippet:160">

This directive is governed by Google's "Robots Meta Tag" specification and is supported by Googlebot but not universally by other crawlers.

max-image-preview

max-image-preview:[setting] controls the size of image previews Google can display in search results. The three accepted values are:

Value	Meaning
`none`	No image preview is shown.
`standard`	A default-size thumbnail may be shown.
`large`	A larger image preview may be shown, including in Google Discover.

Setting max-image-preview:large is recommended for editorial and news content because it enables larger thumbnails in Discover, which tend to increase click-through rates. If you omit this directive, Google chooses a preview size at its own discretion.

max-video-preview

max-video-preview:[n] limits how many seconds of a video Google can show as an animated preview in search results. Set it to 0 to prevent any video preview, or -1 to allow previews of any length. A specific integer value (e.g., max-video-preview:10) caps the preview at that many seconds.

nositelinkssearchbox

nositelinkssearchbox tells Google not to show a sitelinks search box for your website in branded search results. This is a niche directive used when the site owner does not want users to search within the site directly from Google's results page.

notranslate

notranslate tells Google not to offer a translation of the page in search results. This is relevant for sites where the original language is important to the user experience and machine-translated versions would be misleading.

noimageindex

noimageindex tells Google not to index any images on the page. The images will not appear in Google Images search results. The page itself may still be indexed normally. Use this if a page contains images that should remain out of image search (licensed photography, private product mockups, etc.).

unavailable_after

unavailable_after:[date] instructs Google to stop showing the page in search results after a specified date and time. This is useful for time-limited promotional pages, event announcements, or content with a known expiry date. The date should be in RFC 850 format:

<meta name="robots" content="unavailable_after: 25 Jun 2026 15:00:00 GMT">

Complete Directives Reference

Directive	Effect	Default?
`index`	Page may appear in search results.	Yes
`noindex`	Page must not appear in search results.	No
`follow`	Links on the page may be followed.	Yes
`nofollow`	Links on the page must not be followed.	No
`noarchive`	Do not show a cached copy in results.	No
`nosnippet`	Do not show a text or video snippet.	No
`max-snippet:[n]`	Limit text snippet to n characters.	No (-1 implied)
`max-image-preview:[s]`	Control image preview size (none/standard/large).	No
`max-video-preview:[n]`	Limit video preview to n seconds.	No (-1 implied)
`nositelinkssearchbox`	Hide the sitelinks search box.	No
`notranslate`	Do not offer translation in results.	No
`noimageindex`	Do not index images on the page.	No
`unavailable_after:[date]`	Remove from index after specified date.	No

Robots.txt vs Meta Robots Tag: When to Use Which

This is one of the most misunderstood distinctions in technical SEO. Both mechanisms tell search engines something about how to handle your content, but they operate at entirely different levels and have fundamentally different effects.

Robots.txt is a server-level file that controls crawling. When you disallow a URL in robots.txt, you are telling crawlers not to fetch that resource at all. The page is not read, its meta tags are not processed, and its links are not followed. Crucially, the page can still appear in search results if other pages link to it — Google knows the URL exists even if it has never read the page's content. You will see it indexed as a result with no title and no snippet.

The meta robots tag, by contrast, controls indexing. Google must be able to crawl the page to read the tag. If you block a page in robots.txt and also put a noindex tag in its HTML, Google will never see the noindex tag because it cannot access the page. The robots.txt block takes precedence for crawling, but the indexing status depends entirely on what Google can observe.

Scenario	Use robots.txt	Use meta robots noindex
Block all crawlers from a large folder (images, scripts, admin)	Yes	No
Prevent a page from appearing in search results	No	Yes
Save crawl budget on low-value sections	Yes	Less effective
Remove a page that is already indexed	No — Google needs to crawl it to see noindex	Yes
Block crawling of internal search result pages	Yes	Also add noindex if accessible
Prevent indexing of a staging site	Both, as belt-and-braces	Primary method

The safest approach for pages you never want indexed: set noindex in the meta robots tag and allow Googlebot to crawl the page so it can see and honour the directive. If you also need to reduce server load from crawling, add a robots.txt disallow — but understand that this does not guarantee removal from the index. For more detail, see our guide to what is robots.txt and use the robots.txt generator to build a valid file.

The X-Robots-Tag HTTP Header

The X-Robots-Tag is an HTTP response header that carries the same directives as the meta robots tag but is delivered at the server level rather than in the HTML body. This makes it the only way to apply robots directives to non-HTML resources such as PDF files, images, video files, and JavaScript files — resources that have no HTML <head> section.

An example X-Robots-Tag header as it appears in an HTTP response:

HTTP/1.1 200 OK
Content-Type: application/pdf
X-Robots-Tag: noindex

You can also target specific crawlers in the header:

X-Robots-Tag: googlebot: noindex, nofollow

In Apache, you can add an X-Robots-Tag to all PDF responses using a directive in your .htaccess file:

<FilesMatch "\.(pdf)$">
  Header set X-Robots-Tag "noindex"
</FilesMatch>

In Nginx, the equivalent configuration in your server block would be:

location ~* \.pdf$ {
  add_header X-Robots-Tag "noindex";
}

When both a meta robots tag and an X-Robots-Tag header are present, Google combines the directives. If the header says noindex and the meta tag says index, the result is noindex because Google applies the most restrictive interpretation across all discovered directives. This is an important debugging point: if a page stubbornly refuses to index despite having a clean meta robots tag, always check whether an X-Robots-Tag header is being served.

How Google Processes Robots Directives

Google's handling of robots directives is more nuanced than the specification alone suggests. Understanding the processing pipeline helps you predict how your changes will take effect and debug situations where Google's behaviour does not match your intent.

1. Crawl access is checked first

Before Googlebot can read any meta tag or HTTP header, it must be permitted to fetch the URL. If robots.txt disallows the Googlebot user-agent for that path, Googlebot will not request the page. Any directives in the HTML are invisible to Google. This is why blocking pages with robots.txt while expecting noindex to work is a common and consequential mistake.

2. HTTP headers are read before HTML

When Googlebot does fetch a URL, the server returns HTTP headers before the HTML body. The X-Robots-Tag header (if present) is processed at this stage. If it contains a noindex directive, Google can honour it without reading the full HTML.

3. The meta robots tag is parsed from the HTML head

Googlebot reads the <head> section of the HTML. If a meta robots tag is present, its directives are recorded. Google uses a lenient parser and handles minor syntax variations (extra spaces, mixed case), but malformed tags — such as using http-equiv instead of name — may not be recognised. Always use name="robots".

4. Conflicting directives are resolved restrictively

If the page contains multiple robots meta tags, or a combination of meta tags and X-Robots-Tag headers, Google combines all directives and applies the most restrictive interpretation. Two tags pointing in opposite directions will always result in the more restrictive one winning:

<!-- These two tags together result in: noindex, nofollow -->
<meta name="robots" content="noindex">
<meta name="robots" content="follow">

5. Directives are not honoured instantly

Adding a noindex tag does not remove a page from Google's index the moment you save the file. Googlebot must recrawl the URL, see the directive, and the indexing system must then process and act on that signal. For pages Googlebot visits frequently, this can happen within hours. For rarely crawled pages, it can take weeks. If you need faster removal, use the URL Removal tool in Google Search Console alongside the noindex tag. See our guide to how to remove a page from Google.

6. Robots.txt can cause "crawled but not indexed" to persist

A page blocked by robots.txt but linked from other pages will remain in Google's index as a URL it knows about but cannot read. Google may even show it as a result with no title or snippet. To remove such a page from the index, you must allow crawling (so Google can see your noindex tag) and then add the noindex directive. Removing the robots.txt block without adding noindex may cause the page to be indexed for the first time.

Robots Directives and Canonical Tags

The canonical tag (rel="canonical") and the noindex directive serve related but distinct purposes, and combining them incorrectly can send contradictory signals that confuse crawlers.

A canonical tag on a page says: "The preferred version of this content lives at [URL]." A noindex on the same page says: "Do not include this page in the index." These are not inherently contradictory — you might have a paginated product listing that canonicalises to page 1 and also carries noindex to ensure only page 1 appears in results. That is a coherent signal.

Where it gets problematic:

Canonical pointing to a noindexed page: If the canonical destination itself has noindex, you are telling Google that the preferred version of this content should not be indexed. The result is that neither version gets indexed. This is a common mistake during site restructures where a new canonical target was set before the noindex was removed.
Noindex on a page with inbound canonicals from other pages: Other pages may be consolidating their indexing value into a URL that you then noindex. The pages pointing to it are handing off their indexing value to a page you want suppressed. You typically want to remove those canonical references before noindexing the target.
Canonical and robots.txt block on the same URL: If you disallow a URL in robots.txt and other pages have canonical tags pointing to it, Googlebot cannot crawl the canonical destination to verify it is the right choice. This weakens the canonical signal significantly.

As a rule: if you noindex a page, remove any self-referencing canonicals on that page and update other pages that canonicalise to it. If you want to suppress duplicate content without noindex, use the canonical tag alone on the duplicate pages pointing to the preferred version.

Common Robots Directive Mistakes

Robots directive errors are responsible for a disproportionate share of serious indexing problems. The following are the most frequently encountered mistakes in real-world site audits.

Accidentally noindexing the entire site during development

Most CMS platforms — WordPress, Shopify, Magento, and others — have a global "Discourage search engines" or "private" setting that adds noindex to every page. This is intended for use during development and staging. It is consistently forgotten when the site goes live. The result is a site that looks normal to visitors but has told Google to exclude every single page from its index.

In WordPress, the setting is found at Settings > Reading > "Discourage search engines from indexing this site." In Shopify, it is the "password protect" toggle. In both cases, the fix is one checkbox, but finding the cause after rankings disappear can take hours if you do not know where to look. Run a robots directives check on your homepage immediately after any major deployment or CMS configuration change.

Staging environments indexed in production

Staging sites that are publicly accessible and not protected by noindex or a robots.txt block can be crawled and indexed. This creates duplicate content issues and can dilute the authority of your production URLs. Always set global noindex on staging, ideally at the server level via X-Robots-Tag so it applies universally and does not depend on CMS settings that developers might override.

Template-level noindex left in production

A developer adds noindex to a page template to prevent crawling during build or testing. The template is later deployed to production without removing the directive. Every page built from that template is now noindexed. This is especially common with blog post templates, product page templates, and landing page builders that use shared layout files.

noindex combined with robots.txt disallow

As discussed above: if robots.txt blocks a URL, Googlebot never sees the noindex tag. If a page is currently indexed and you want to remove it, you must allow crawling before (or simultaneously with) adding noindex. A common repair sequence: add noindex, wait for recrawl and removal from index, then add robots.txt disallow if you also want to stop crawl spend on that URL.

Noindex on paginated pages with canonical errors

Some sites add noindex to paginated archive pages (page 2, page 3, etc.) to prevent thin index bloat. This is a legitimate approach, but it breaks down when those pages also have self-canonical tags or when products on page 2+ are not accessible from page 1. Google recommends allowing pagination to be indexed and crawled freely, using good internal linking to ensure product discovery.

Missing meta robots on dynamically generated pages

JavaScript-rendered applications sometimes fail to inject meta robots tags correctly. If the tag is added by client-side JavaScript after page load, Googlebot may not see it, especially if rendering is deferred. Important directives like noindex should be present in the server-rendered HTML, not added exclusively by JavaScript.

Noindex on pages included in the XML sitemap

Including a URL in your sitemap is a signal that the page is important and should be indexed. Having a noindex tag on that same URL is a direct contradiction. While Google can handle the conflict (noindex wins), the contradiction wastes your sitemap's effectiveness and generates Search Console warnings. Remove noindexed URLs from your sitemap.

How to Check Robots Directives

There are several methods to check the robots directives on a page, from simple manual inspection to automated crawl-based auditing.

Browser developer tools

Open any page in Chrome, right-click and select "View Page Source" (Ctrl+U / Cmd+U), then search for meta name="robots". This shows you the HTML as delivered to the browser, though it will not show server-side X-Robots-Tag headers.

To see HTTP headers including X-Robots-Tag, open Chrome DevTools (F12), go to the Network tab, reload the page, click the document request, and look under the Response Headers section.

Google Search Console URL Inspection

The URL Inspection tool in Google Search Console shows you how Google last crawled and rendered a page. Under "Indexing," it explicitly reports whether noindex was detected and which mechanism triggered it (meta tag or HTTP header). This is the most authoritative check because it shows what Google's own crawler saw, including JavaScript-rendered changes.

Screaming Frog and other crawl tools

Crawl tools like Screaming Frog enumerate the meta robots content for every URL they visit and let you filter by directive. This is useful for site-wide audits where you need to find all noindexed pages in bulk rather than checking one at a time.

RankNibbler robots directives checker

RankNibbler provides an instant, free robots directives check as part of its full site audit. Enter any URL and RankNibbler fetches the page, reads the HTTP headers, parses the HTML head, and reports the complete set of robots directives in place. No browser extension or crawl software is required.

How RankNibbler Checks and Scores Robots Directives

RankNibbler's robots directives checker runs five distinct checks against every URL it audits. Each check targets a specific category of problem, and together they produce a clear, actionable picture of your page's indexing posture.

1. Meta robots tag presence and content

RankNibbler reads the raw HTML returned by the server and parses all <meta name="robots"> and <meta name="googlebot"> tags in the document <head>. It extracts the full content attribute value and reports every directive present. If no meta robots tag exists, the tool reports the default implicit state (index, follow) so you know the page is operating on defaults rather than explicit control.

2. Noindex detection and warning

If any parsed tag contains noindex, RankNibbler flags this with a high-visibility warning. This is the most critical check in the robots directives audit because noindex directly controls whether the page can appear in Google's search results. The flag appears regardless of whether the directive targets all robots or a specific crawler, and the tool reports which tag was responsible.

3. Nofollow detection

A page-level nofollow directive is flagged as an advisory notice rather than an error, because nofollow at the page level is legitimately used in some scenarios. RankNibbler reports it so you can confirm the directive is intentional. Accidental nofollow at the page level can interrupt your internal linking graph and prevent PageRank from flowing to deeper pages.

4. Conflict detection between tags

When multiple meta robots tags are present on the same page — a situation that can arise with tag manager insertions, theme-level defaults, and plugin-added tags — RankNibbler reports all tags and highlights any that carry conflicting directives. Two tags that combine to produce a more restrictive outcome than either alone are flagged so you know which tag to investigate and remove.

5. Consistency check against sitemap and canonical

RankNibbler cross-references the robots directives against other on-page signals. If a page carries a noindex tag but also contains a self-referencing canonical pointing to itself, or if the page is known to be included in the site's XML sitemap, the tool flags the inconsistency. These inconsistencies do not break indexing outright, but they represent contradictory signals that waste crawl effort and generate Search Console warnings.

Run your pages through the RankNibbler homepage or set up a full site audit to catch these issues across your entire URL inventory. For understanding what happens after indexing, see how to check if Google has indexed your page.

Debugging Indexing Issues with Robots Directives

When a page that should be indexed is missing from Google's search results, robots directives are always the first thing to check. Work through this diagnostic sequence before investigating other potential causes.

Step 1: Confirm the page is actually not indexed

Search Google for site:yourdomain.com/exact-path. If the page does not appear, it may be noindexed, blocked by robots.txt, penalised, or simply not yet crawled. Use the URL Inspection tool in Google Search Console for a definitive answer — it tells you whether the page is in the index, whether it was excluded, and the reason. See our guide to how to check if a page is indexed for the full process.

Step 2: Check for noindex in the meta robots tag

View the page source (Ctrl+U) and search for robots. Look for any meta tag with noindex in its content. Also search for googlebot to check for Googlebot-specific directives. Do not rely on the rendered view — check the source HTML to see what was delivered before any JavaScript modifications.

Step 3: Check the X-Robots-Tag HTTP header

Open Chrome DevTools, go to Network, reload the page, click the HTML document request, and check Response Headers for any X-Robots-Tag entry. This is easy to miss because it does not appear in the HTML at all — it only exists in the HTTP response.

Step 4: Check robots.txt

Visit yourdomain.com/robots.txt directly and look for Disallow entries that match your URL path. Use Google Search Console's robots.txt tester to confirm whether a specific URL is blocked. Remember: a robots.txt block does not cause removal from the index, but it does prevent recrawling and therefore prevents Google from discovering a noindex tag you may have added.

Step 5: Use Google Search Console URL Inspection

The URL Inspection tool shows you the last crawl date, render screenshot, and explicit indexing status. If Google reports "Page is not indexed" with a reason of "Excluded by 'noindex' tag," you have your answer. If the reason is "Crawled — currently not indexed," the issue is not a directive but a quality or content problem. If the reason is "Blocked by robots.txt," see Step 4.

Step 6: Request indexing after fixing the issue

Once you have identified and fixed the directive causing exclusion, use the URL Inspection tool to request indexing. This does not guarantee immediate indexing but it does signal to Google that the page is ready to be recrawled and signals that the issue has been resolved. For urgent removals (after an accidental noindex is fixed), this expedites the re-inclusion process.

For the full workflow on removing pages from Google, see how to remove a page from Google.

Robots Directives for Specific CMS Platforms

The way robots directives are managed varies significantly across CMS platforms. Knowing where to look in your specific system saves time when auditing.

WordPress

WordPress's built-in "Discourage search engines" option at Settings > Reading adds a noindex header site-wide. Individual post and page noindex settings are typically managed through an SEO plugin. In Yoast SEO, the "Advanced" tab on each post or page has a "Search engine visibility" option. In Rank Math, the equivalent is under the "Advanced" panel. Both plugins also support bulk editing of meta robots via their respective sitemap and bulk edit interfaces. RankNibbler detects the output regardless of which plugin or theme generates it.

Shopify

Shopify adds noindex to several page types by default: faceted filter pages, collection+tag combination pages, and internal search result pages. These defaults are sensible because those pages are typically duplicate or near-duplicate content. Custom theme modifications can accidentally override these defaults, and third-party apps sometimes inject their own meta robots values. Always check robots directives after installing a new Shopify app or updating your theme.

Wix, Squarespace, Webflow

Website builders typically have a page-level "Hide from search engines" toggle or SEO settings panel. These set noindex at the individual page level. Check each platform's documentation for its exact implementation, and verify the output with RankNibbler because the rendered HTML sometimes differs from what the settings panel implies.

Advanced Use Cases

Noindex for A/B test variants

When running A/B tests that serve different page variants on separate URLs, add noindex to the variant URLs to prevent them from competing with the canonical version in search results. Once the test is complete and a winner is selected, either redirect the variant URLs to the canonical or keep the noindex if they serve a permanent purpose.

Noindex for geo-targeted landing pages

Sites that create city-specific or region-specific landing pages at scale sometimes end up with hundreds of near-identical pages. If those pages do not have genuinely distinct and useful content, they dilute overall site quality. Adding noindex to thin geo-pages while keeping noindex off the substantive ones is a common quality management strategy — though the better long-term solution is to improve the content rather than suppress it.

Crawler-specific directives

You can apply directives to specific crawlers only. This is most useful when you want a page to appear in Google but not in Bing, or when you want to allow Googlebot-News but restrict the general Googlebot:

<meta name="googlebot" content="index, follow">
<meta name="bingbot" content="noindex">

Only the named crawler honours its specific tag. All crawlers that match the generic robots tag will honour that. If both a generic and a specific tag exist, the specific one overrides the generic for the named crawler.

Controlling Google Discover with max-image-preview

Google Discover is a content feed served on Android devices and in the Google app. To be eligible for large image previews in Discover — which significantly increases click-through rate — your pages must set max-image-preview:large or equivalent. Many sites miss this because it is not required for regular search, only for Discover. Add it to article and blog post templates to maximise Discover eligibility:

<meta name="robots" content="index, follow, max-image-preview:large, max-snippet:-1">

Robots Directives and the SEO Glossary

The terminology around robots directives can be confusing because different tools and resources use slightly different terms. For a complete reference of SEO terminology including crawling, indexing, canonicalisation, and crawl budget, see the RankNibbler SEO glossary. For a deeper dive into the broader context of on-page optimisation where robots directives sit alongside title tags, meta descriptions, and structured data, see what is on-page SEO.

Frequently Asked Questions

What happens if I have no meta robots tag at all?

If no meta robots tag is present, search engines apply the default behaviour: index, follow. The page is eligible for indexing and all outgoing links are followed. The absence of a robots tag is not an error — it simply means you are relying on the default, which is the correct state for most pages. You only need to add a tag when you want to deviate from the default.

Does nofollow on the page level affect individual link rel="nofollow" attributes?

A page-level nofollow in the meta robots tag applies to all links on the page, overriding individual link attributes in the same direction. However, individual rel="nofollow" attributes on specific links are independent of the page-level directive. If the page is set to follow but a specific link has rel="nofollow", only that link is nofollowed. You do not need a page-level nofollow to nofollow individual links.

Can I use noindex and follow together?

Yes. noindex, follow is a valid and useful combination. It tells search engines not to include the page in results but to still crawl and follow the links on the page. This is appropriate for pages that exist primarily for navigation — category hubs, internal search results — where you want the link graph to be traversable without the pages themselves appearing in results.

Will noindex remove a page from Google immediately?

No. Adding noindex tells Google what to do the next time it crawls the page. Removal from the index happens after Googlebot recrawls the page, sees the directive, and the indexing system processes it. This can take anywhere from a few hours (for frequently crawled pages) to several weeks (for rarely crawled pages). If you need faster removal, use the URL Removal tool in Google Search Console alongside the noindex tag.

What is the difference between robots.txt disallow and noindex?

Robots.txt disallow stops Googlebot from fetching the page. Noindex stops Google from including the page in its search index. A disallowed page can still appear in search results (as a URL-only result with no title or snippet) if other pages link to it. A noindexed page (that Googlebot can crawl) will be removed from the index entirely. For full exclusion, use noindex — and ensure Googlebot can crawl the page to see it.

Can a page with noindex still receive link equity?

Google's documented position is that a noindexed page can still pass PageRank through its outbound links if those links are set to follow. In practice, pages that are noindexed tend to be crawled less frequently over time, which reduces how reliably their links are processed. If link equity distribution is important to you, it is better to use canonical tags or remove the noindex rather than relying on a noindexed page to pass link value.

My page has noindex but still appears in Google. Why?

There are several possible reasons. First, Googlebot may not have recrawled the page yet after you added noindex — it takes time for the directive to be processed. Second, there may be a robots.txt block preventing Googlebot from fetching the page, meaning it cannot see the noindex tag. Third, a JavaScript-based noindex injection may not be visible to Googlebot if it is not rendering JavaScript for that URL. Check the URL Inspection tool in Google Search Console to see what Google's crawler actually saw during its last visit.

Should I put noindex in robots.txt or in the meta tag?

Noindex is not a valid robots.txt directive in the standard protocol. Robots.txt only supports Allow and Disallow (plus Sitemap and Crawl-delay). The Noindex directive was proposed for robots.txt but was never formally adopted by Google, and Google stopped honouring it in 2019. Always use the meta robots tag or X-Robots-Tag header for noindex instructions, not robots.txt.

Does nofollow in the meta robots tag affect SEO negatively?

Applying nofollow at the page level prevents PageRank from flowing out of that page to other pages on your site. If applied accidentally to pages with strong internal links, this can deprive important pages of link equity that would otherwise flow to them. It does not directly penalise the page itself, but it can reduce the effective PageRank of pages downstream in your internal linking hierarchy. Always audit page-level nofollow directives to confirm they are intentional.

How do I check if an X-Robots-Tag header is being served?

The easiest method is Chrome DevTools: open the Network tab, reload the page, click the HTML document in the request list, and look at the Response Headers panel. You can also use curl in a terminal: curl -I https://yourdomain.com/page which returns only the HTTP headers. RankNibbler's site audit checks HTTP headers as part of its robots directives analysis and will surface any X-Robots-Tag values it finds.

Start Checking Your Robots Directives

Robots directives are among the highest-impact, lowest-visibility settings in technical SEO. A single misconfigured tag can exclude an entire site from search results, and because the tag sits in the HTML head rather than in visible content, it is easy to miss during manual reviews.

RankNibbler's free robots directives checker gives you an immediate read on the indexing and crawling posture of any URL. Run it on your most important pages now, then set up a regular site audit to catch regressions automatically.

Check a URL now — free robots directives check with full on-page SEO audit
Run a site audit — find robots directive issues across your full URL inventory
Build a robots.txt file — generate a valid robots.txt with the correct syntax
Understand robots.txt — learn how robots.txt and meta robots work together
Learn about indexing — understand what indexing means and why it matters for search visibility
Remove a page from Google — step-by-step guide to deindexing a URL
Check if a page is indexed — confirm whether Google has indexed your pages
SEO glossary — definitions for crawling, indexing, canonicalisation and more