Robots.txt is the first file search engine crawlers look for when they visit your site. It controls what gets crawled and what does not. A misconfigured robots.txt can block your entire site from Google. A well-crafted one saves crawl budget and keeps private directories out of search results.
When any crawler — Googlebot, Bingbot, or others — arrives at your site, it first requests yoursite.com/robots.txt. If the file exists, the crawler reads the rules before crawling anything else. If the file does not exist, the crawler assumes everything is fair game and crawls freely.
The file lives at your domain root. Not in a subfolder. Not with a different name. Exactly at /robots.txt.
| Directive | Purpose | Example |
|---|---|---|
| User-agent | Specifies which crawler the rules apply to | User-agent: Googlebot |
| Disallow | Blocks a path from being crawled | Disallow: /admin/ |
| Allow | Overrides a Disallow for a specific path | Allow: /admin/public/ |
| Sitemap | Points crawlers to your XML sitemap | Sitemap: https://example.com/sitemap.xml |
| Crawl-delay | Seconds between requests (ignored by Google) | Crawl-delay: 10 |
| * (wildcard) | Matches any character sequence in a URL | Disallow: /*.pdf$ |
| $ (end match) | Matches the end of a URL | Disallow: /*.json$ |
| # (comment) | Adds a human-readable note | # Block staging pages |
| Pattern | Rules | What It Does |
|---|---|---|
| Allow everything | User-agent: *\nAllow: / | All crawlers can access all pages — the most open configuration |
| Block everything | User-agent: *\nDisallow: / | No crawler can access any page — useful for staging sites |
| Block one directory | User-agent: *\nDisallow: /private/ | Blocks the /private/ directory from all crawlers |
| Block multiple directories | User-agent: *\nDisallow: /admin/\nDisallow: /tmp/\nDisallow: /staging/ | Blocks three directories from all crawlers |
| Allow only Googlebot | User-agent: Googlebot\nAllow: /\nUser-agent: *\nDisallow: / | Only Google can crawl; everyone else is blocked |
| Block images directory | User-agent: *\nDisallow: /images/ | Prevents crawlers from indexing your image directory |
| Block PDF files | User-agent: *\nDisallow: /*.pdf$ | Blocks all files ending in .pdf from being crawled |
| Block query parameters | User-agent: *\nDisallow: /*?* | Blocks URLs with query strings (filters, sorts, session IDs) |
| Add sitemap reference | Sitemap: https://example.com/sitemap.xml | Tells all crawlers where your sitemap lives — place at the bottom |
| Slow down crawling | User-agent: *\nCrawl-delay: 5 | Asks bots to wait 5 seconds between requests (Google ignores this) |
| Feature | WildandFree Generator | Google Robots Tester | SmallSEOTools | SEOptimer |
|---|---|---|---|---|
| Generate robots.txt | ✓ Full generator with all directives | ✗ Testing only, no generation | ✓ Basic generator | ✓ Basic generator |
| Test existing file | ✓ Preview output | ✓ Excellent — official tool | ✗ No testing | ~Basic validation |
| AI bot rules | ✓ Includes GPTBot, CCBot, etc. | ✗ Not covered | ✗ Not covered | ✗ Not covered |
| Wildcard support | ✓ Full pattern matching | ✓ Shows wildcard results | ~Limited | ~Limited |
| Custom directives | ✓ Crawl-delay, multiple user-agents | ✗ Read-only testing | ~Some options | ~Some options |
| Export/download | ✓ One-click download | ✗ Not applicable | ✓ Copy text | ✓ Copy text |
| No account required | ✓ Free, no signup | ✗ Requires Search Console access | ✓ Free | ~Free with limits |
| Privacy | ✓ Runs in your browser | ✓ Google servers | ~Ad-supported, data collected | ~Ad-supported, data collected |
Disallow: /admin blocks /admin, /admin/, /administrator, /admin-panel. Disallow: /admin/ blocks only paths starting with /admin/. Be precise.Generate a valid robots.txt file in seconds — no syntax memorization required.
Open Robots.txt Generator