What is a robots.txt file?

A robots.txt file is a plain text file placed at the root of your website (yoursite.com/robots.txt) that tells search engine crawlers which pages or directories they can and cannot access. It uses a standard syntax with User-agent, Disallow, Allow, and Sitemap directives. Every major search engine — Google, Bing, Yahoo, Yandex — reads and respects it.

Where does robots.txt go on my website?

It must be at the root of your domain: yoursite.com/robots.txt. Not in a subdirectory. Not renamed. If it is at yoursite.com/files/robots.txt, crawlers will not find it. It must be accessible via HTTP at the exact path /robots.txt on your domain.

Does robots.txt hide pages from Google?

No. Robots.txt tells crawlers not to crawl a page, but Google can still index the URL if other sites link to it. The page may appear in search results with a "No information is available for this page" notice. To truly hide a page from search results, use a noindex meta tag or X-Robots-Tag HTTP header.

What is the difference between Disallow and noindex?

Disallow in robots.txt tells crawlers not to visit the page. Noindex (meta tag or HTTP header) tells crawlers they can visit the page but should not include it in search results. For pages you want completely hidden from search, use noindex. For pages you want crawlers to skip entirely (saving crawl budget), use Disallow.

What does Crawl-delay do?

Crawl-delay tells bots to wait a specified number of seconds between requests. Example: Crawl-delay: 10 means wait 10 seconds between each page fetch. Google ignores Crawl-delay (use Google Search Console to set crawl rate instead). Bing, Yandex, and other crawlers do respect it.

Can I use wildcards in robots.txt?

Yes. Google and Bing support two wildcards: asterisk (*) matches any sequence of characters, and dollar sign ($) matches the end of a URL. Example: Disallow: /*.pdf$ blocks all URLs ending in .pdf. Not all crawlers support wildcards — it is a de facto standard, not part of the original specification.

How do I test my robots.txt file?

Google Search Console has a robots.txt tester that shows which URLs are blocked and which are allowed. You can also visit yoursite.com/robots.txt directly in a browser to verify the file is accessible and formatted correctly. Our generator creates valid syntax automatically.

Should I block CSS and JavaScript files in robots.txt?

No. Google needs to access CSS and JS files to render your pages properly. Blocking them can hurt your search rankings because Googlebot cannot see your page as users see it. This was common advice years ago but is now outdated and harmful.

Robots.txt Generator — Complete Guide to Syntax, Rules & Examples

Last updated: April 20268 min readSEO Tools

Robots.txt is the first file search engine crawlers look for when they visit your site. It controls what gets crawled and what does not. A misconfigured robots.txt can block your entire site from Google. A well-crafted one saves crawl budget and keeps private directories out of search results.

How Robots.txt Works

When any crawler — Googlebot, Bingbot, or others — arrives at your site, it first requests yoursite.com/robots.txt. If the file exists, the crawler reads the rules before crawling anything else. If the file does not exist, the crawler assumes everything is fair game and crawls freely.

The file lives at your domain root. Not in a subfolder. Not with a different name. Exactly at /robots.txt.

Complete Syntax Reference

Directive	Purpose	Example
User-agent	Specifies which crawler the rules apply to	User-agent: Googlebot
Disallow	Blocks a path from being crawled	Disallow: /admin/
Allow	Overrides a Disallow for a specific path	Allow: /admin/public/
Sitemap	Points crawlers to your XML sitemap	Sitemap: https://example.com/sitemap.xml
Crawl-delay	Seconds between requests (ignored by Google)	Crawl-delay: 10
* (wildcard)	Matches any character sequence in a URL	Disallow: /*.pdf$
$ (end match)	Matches the end of a URL	Disallow: /*.json$
# (comment)	Adds a human-readable note	# Block staging pages

10 Common Robots.txt Patterns

Pattern	Rules	What It Does
Allow everything	User-agent: *\nAllow: /	All crawlers can access all pages — the most open configuration
Block everything	User-agent: *\nDisallow: /	No crawler can access any page — useful for staging sites
Block one directory	User-agent: *\nDisallow: /private/	Blocks the /private/ directory from all crawlers
Block multiple directories	User-agent: *\nDisallow: /admin/\nDisallow: /tmp/\nDisallow: /staging/	Blocks three directories from all crawlers
Allow only Googlebot	User-agent: Googlebot\nAllow: /\nUser-agent: *\nDisallow: /	Only Google can crawl; everyone else is blocked
Block images directory	User-agent: *\nDisallow: /images/	Prevents crawlers from indexing your image directory
Block PDF files	User-agent: \nDisallow: /.pdf$	Blocks all files ending in .pdf from being crawled
Block query parameters	User-agent: \nDisallow: /?*	Blocks URLs with query strings (filters, sorts, session IDs)
Add sitemap reference	Sitemap: https://example.com/sitemap.xml	Tells all crawlers where your sitemap lives — place at the bottom
Slow down crawling	User-agent: *\nCrawl-delay: 5	Asks bots to wait 5 seconds between requests (Google ignores this)

Robots.txt Generator Comparison

Feature	WildandFree Generator	Google Robots Tester	SmallSEOTools	SEOptimer
Generate robots.txt	✓ Full generator with all directives	✗ Testing only, no generation	✓ Basic generator	✓ Basic generator
Test existing file	✓ Preview output	✓ Excellent — official tool	✗ No testing	~Basic validation
AI bot rules	✓ Includes GPTBot, CCBot, etc.	✗ Not covered	✗ Not covered	✗ Not covered
Wildcard support	✓ Full pattern matching	✓ Shows wildcard results	~Limited	~Limited
Custom directives	✓ Crawl-delay, multiple user-agents	✗ Read-only testing	~Some options	~Some options
Export/download	✓ One-click download	✗ Not applicable	✓ Copy text	✓ Copy text
No account required	✓ Free, no signup	✗ Requires Search Console access	✓ Free	~Free with limits
Privacy	✓ Runs in your browser	✓ Google servers	~Ad-supported, data collected	~Ad-supported, data collected

Critical Mistakes to Avoid

Blocking CSS and JavaScript — Google needs these to render your pages. Blocking them tanks your rankings because Googlebot sees a broken page.
Using robots.txt for security — robots.txt is publicly readable. Anyone can visit yoursite.com/robots.txt and see exactly what you are hiding. Use proper authentication for sensitive content.
Blocking your sitemap — if your sitemap URL falls under a Disallow rule, crawlers cannot read it. Always ensure your sitemap path is accessible.
Forgetting the trailing slash — Disallow: /admin blocks /admin, /admin/, /administrator, /admin-panel. Disallow: /admin/ blocks only paths starting with /admin/. Be precise.
Thinking Disallow = noindex — Disallow prevents crawling, not indexing. Google can still index a URL it has never crawled if other pages link to it. Use a noindex tag to remove pages from search results.

Pair With These SEO Tools

Robots.txt Generator — build your robots.txt file with the correct syntax
Meta Tag Generator — create meta tags including noindex directives for pages you want hidden from search
Open Graph Checker — verify your OG tags so social shares display correctly
Question Finder — discover what questions people ask about your topic for content ideas
Headline Analyzer — score your page titles before publishing
Keyword Density Checker — analyze keyword distribution across your content
Readability Scorer — check if your content is accessible to your target audience

Generate a valid robots.txt file in seconds — no syntax memorization required.

Open Robots.txt Generator

Robots.txt Generator — Complete Guide to Syntax, Rules & Examples

How Robots.txt Works

Complete Syntax Reference

10 Common Robots.txt Patterns

Robots.txt Generator Comparison

Critical Mistakes to Avoid

Pair With These SEO Tools

Related Posts

Block AI Bots with Robots.txt

Best Robots.txt for WordPress

What Is Robots.txt?

AI Bot Blocking Guide