Your robots.txt file is one of the first things search engine crawlers look for when they visit your site. A single wrong rule can accidentally hide your entire website from Google. A well-configured robots.txt, on the other hand, keeps crawlers focused on your important pages and away from duplicate content, admin panels, and internal search results.
This guide covers everything from basic syntax to advanced directives. Whether you are setting up your first robots.txt or auditing an existing one, you will learn exactly what each rule does, how to avoid the most dangerous mistakes, and how to generate a clean robots.txt file in seconds with our free robots.txt generator.
Robots.txt is a plain text file that lives at the root of your website — https://yoursite.com/robots.txt. It uses the Robots Exclusion Protocol to communicate with search engine crawlers (Googlebot, Bingbot, etc.) about which parts of your site they should and should not access.
Important clarification: robots.txt controls crawling, not indexing. Blocking a URL in robots.txt prevents crawlers from visiting the page, but Google can still index the URL if other sites link to it. The indexed result will just show the URL with no description. To fully prevent indexing, use a noindex meta tag on the page itself.
Every robots.txt file uses the same simple structure: one or more groups of rules, each starting with a User-agent line.
User-agent: *
Disallow: /admin/
Disallow: /private/
Allow: /admin/public-page/
User-agent: Googlebot
Disallow: /tmp/
Sitemap: https://yoursite.com/sitemap.xml
User-agent — specifies which crawler the rules apply to. Use * for all crawlers.Disallow — blocks crawlers from the specified path. Disallow: /admin/ blocks everything under /admin/.Allow — overrides a Disallow rule for a specific path. Useful for allowing a specific page inside a blocked directory.Sitemap — tells crawlers where your XML sitemap is located. This can appear anywhere in the file.Crawl-delay — asks crawlers to wait a set number of seconds between requests (not supported by Google).Google and Bing support two wildcard characters:
* — matches any sequence of characters. Example: Disallow: /*.pdf$ blocks all PDF files.$ — matches the end of a URL. Without it, /page matches /page, /page2, and /pages/anything.User-agent: *
Disallow: /
This is the nuclear option. Only use this for staging sites, development environments, or sites that are genuinely private. This single rule will remove your entire site from search results.
User-agent: *
Disallow:
An empty Disallow directive means "allow everything." This is effectively the same as not having a robots.txt file, but it explicitly signals to crawlers that you have considered access control.
User-agent: *
Disallow: /admin/
Disallow: /wp-admin/
Disallow: /login/
Disallow: /cart/
Disallow: /checkout/
These pages have no SEO value and waste crawl budget. Block them so crawlers spend their time on your content pages instead.
User-agent: *
Disallow: /search
Disallow: /?s=
Disallow: /*?q=
Internal search result pages create near-infinite thin content URLs. Google specifically warns against letting these get indexed.
Sell Custom Apparel — We Handle Printing & Free ShippingCrawl-delay tells crawlers to wait a certain number of seconds between requests:
User-agent: *
Crawl-delay: 10
This asks crawlers to wait 10 seconds between each page request. This is useful for small servers that cannot handle aggressive crawling.
Important: Google ignores Crawl-delay entirely. To control Googlebot's crawl rate, use Google Search Console's crawl rate settings. Bing, Yandex, and some other crawlers do respect this directive.
Adding a Sitemap directive to your robots.txt helps crawlers discover your XML sitemap, even if they have not found it through other means:
Sitemap: https://yoursite.com/sitemap.xml
Sitemap: https://yoursite.com/sitemap-posts.xml
Sitemap: https://yoursite.com/sitemap-pages.xml
You can list multiple sitemaps. Use the full absolute URL including the protocol. This directive can appear anywhere in the file — it does not need to be inside a User-agent group.
Always test before deploying. A single typo can block Google from your entire site.
yoursite.com/robots.txt in a browser and read through every ruleDisallow: / under User-agent: * is the most common catastrophic errorwww.yoursite.com and blog.yoursite.com are different hosts. Each needs its own robots.txt.Generate a clean robots.txt file in seconds — just toggle the rules you need.
Open Robots.txt GeneratorOur free robots.txt generator lets you build a robots.txt file without memorizing syntax. Toggle common rules on and off, add custom paths, include your sitemap URL, and copy the finished file. Everything runs in your browser with no data sent anywhere.
Once your robots.txt is configured, make sure the rest of your SEO fundamentals are in place. Use our meta tag generator to create proper title tags, descriptions, and Open Graph tags for every page on your site.