Robots.txt is a plain text file placed at the root of your website (yoursite.com/robots.txt) that tells search engine crawlers which pages or sections they are allowed or not allowed to crawl. It is part of the Robots Exclusion Protocol, a standard that all major search engines respect.

Does robots.txt block pages from Google's index?

No. Robots.txt prevents crawling, not indexing. If other sites link to a page you have disallowed in robots.txt, Google can still index that URL — it just will not crawl the content. To truly block a page from the index, use a noindex meta tag instead.

Where should I put my robots.txt file?

Robots.txt must be at the root of your domain: https://yoursite.com/robots.txt. It will not work if placed in a subdirectory. For subdomains, each subdomain needs its own robots.txt file at its root.

What happens if I do not have a robots.txt file?

If your site does not have a robots.txt file, search engines will crawl all accessible pages. This is usually fine for small sites. For larger sites, a robots.txt file helps you manage crawl budget and keep crawlers out of admin pages, duplicate content, and internal search results.

Can I use robots.txt to hide sensitive pages?

No — robots.txt is publicly accessible. Anyone can visit yoursite.com/robots.txt and see exactly which paths you are blocking. Never use it to hide sensitive content. Use proper authentication (passwords, login requirements) for truly private pages.

How do I test my robots.txt file?

Google Search Console has a robots.txt tester under Settings > Crawling. You can also use our free robots.txt generator which includes a preview of your rules before you deploy them.

Robots.txt Guide — How to Create & Optimize Your Robots.txt File

Last updated: March 2026 12 min read SEO Tools

Your robots.txt file is one of the first things search engine crawlers look for when they visit your site. A single wrong rule can accidentally hide your entire website from Google. A well-configured robots.txt, on the other hand, keeps crawlers focused on your important pages and away from duplicate content, admin panels, and internal search results.

This guide covers everything from basic syntax to advanced directives. Whether you are setting up your first robots.txt or auditing an existing one, you will learn exactly what each rule does, how to avoid the most dangerous mistakes, and how to generate a clean robots.txt file in seconds with our free robots.txt generator.

What Is Robots.txt?

Robots.txt is a plain text file that lives at the root of your website — https://yoursite.com/robots.txt. It uses the Robots Exclusion Protocol to communicate with search engine crawlers (Googlebot, Bingbot, etc.) about which parts of your site they should and should not access.

Important clarification: robots.txt controls crawling, not indexing. Blocking a URL in robots.txt prevents crawlers from visiting the page, but Google can still index the URL if other sites link to it. The indexed result will just show the URL with no description. To fully prevent indexing, use a noindex meta tag on the page itself.

Robots.txt Syntax Explained

Every robots.txt file uses the same simple structure: one or more groups of rules, each starting with a User-agent line.

User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /admin/public-page/

User-agent: Googlebot
Disallow: /tmp/

Sitemap: https://yoursite.com/sitemap.xml

Key Directives

User-agent — specifies which crawler the rules apply to. Use * for all crawlers.
Disallow — blocks crawlers from the specified path. Disallow: /admin/ blocks everything under /admin/.
Allow — overrides a Disallow rule for a specific path. Useful for allowing a specific page inside a blocked directory.
Sitemap — tells crawlers where your XML sitemap is located. This can appear anywhere in the file.
Crawl-delay — asks crawlers to wait a set number of seconds between requests (not supported by Google).

Wildcards

Google and Bing support two wildcard characters:

* — matches any sequence of characters. Example: Disallow: /*.pdf$ blocks all PDF files.
$ — matches the end of a URL. Without it, /page matches /page, /page2, and /pages/anything.

Common Rules You Should Know

Block All Crawlers From Everything

User-agent: *
Disallow: /

This is the nuclear option. Only use this for staging sites, development environments, or sites that are genuinely private. This single rule will remove your entire site from search results.

Allow All Crawlers Everywhere

User-agent: *
Disallow:

An empty Disallow directive means "allow everything." This is effectively the same as not having a robots.txt file, but it explicitly signals to crawlers that you have considered access control.

Block Admin and Login Pages

User-agent: *
Disallow: /admin/
Disallow: /wp-admin/
Disallow: /login/
Disallow: /cart/
Disallow: /checkout/

These pages have no SEO value and waste crawl budget. Block them so crawlers spend their time on your content pages instead.

Block Internal Search Results

User-agent: *
Disallow: /search
Disallow: /?s=
Disallow: /*?q=

Internal search result pages create near-infinite thin content URLs. Google specifically warns against letting these get indexed.

Crawl-Delay Directive

Crawl-delay tells crawlers to wait a certain number of seconds between requests:

User-agent: *
Crawl-delay: 10

This asks crawlers to wait 10 seconds between each page request. This is useful for small servers that cannot handle aggressive crawling.

Important: Google ignores Crawl-delay entirely. To control Googlebot's crawl rate, use Google Search Console's crawl rate settings. Bing, Yandex, and some other crawlers do respect this directive.

Sitemap Directive

Adding a Sitemap directive to your robots.txt helps crawlers discover your XML sitemap, even if they have not found it through other means:

Sitemap: https://yoursite.com/sitemap.xml
Sitemap: https://yoursite.com/sitemap-posts.xml
Sitemap: https://yoursite.com/sitemap-pages.xml

You can list multiple sitemaps. Use the full absolute URL including the protocol. This directive can appear anywhere in the file — it does not need to be inside a User-agent group.

Testing Your Robots.txt

Always test before deploying. A single typo can block Google from your entire site.

Google Search Console — go to Settings > Crawling > robots.txt to test specific URLs against your rules
Manual check — visit yoursite.com/robots.txt in a browser and read through every rule
Bing Webmaster Tools — has its own robots.txt analyzer
Our generator — the free robots.txt generator shows a live preview of your rules before you copy the file

Dangerous Mistakes to Avoid

Accidentally blocking your entire site — a misplaced Disallow: / under User-agent: * is the most common catastrophic error
Using robots.txt for security — the file is public. Never put sensitive paths in it thinking they will be hidden. Anyone can read your robots.txt.
Blocking CSS and JavaScript — Google needs to render your pages. Blocking CSS/JS files prevents proper rendering and hurts mobile-first indexing.
Putting robots.txt in the wrong location — it must be at the domain root, not in a subdirectory
Conflicting Disallow and Allow rules — when rules conflict, Google uses the most specific match. But it is easy to create confusion. Keep rules simple and test thoroughly.
Forgetting subdomains — www.yoursite.com and blog.yoursite.com are different hosts. Each needs its own robots.txt.

Generate a clean robots.txt file in seconds — just toggle the rules you need.

Open Robots.txt Generator

Generate Your Robots.txt

Our free robots.txt generator lets you build a robots.txt file without memorizing syntax. Toggle common rules on and off, add custom paths, include your sitemap URL, and copy the finished file. Everything runs in your browser with no data sent anywhere.

Once your robots.txt is configured, make sure the rest of your SEO fundamentals are in place. Use our meta tag generator to create proper title tags, descriptions, and Open Graph tags for every page on your site.