Robots.txt Disallow Patterns: The Complete Guide
Table of Contents
Most robots.txt guides show you a simple Disallow: /admin/ and call it done. But real sites have complex URL structures — filter parameters, paginated pages, session IDs, query strings. Getting Disallow patterns right is what separates a functional robots.txt from one that's blocking the wrong pages or missing the right ones.
Basic Disallow Syntax Rules
Every Disallow rule belongs inside a User-agent block. The User-agent must come first, then the Disallow lines beneath it:
User-agent: * Disallow: /admin/ Disallow: /account/
The path after Disallow is case-sensitive. /Admin/ and /admin/ are treated as different paths. Match the actual URL exactly.
An empty Disallow line means "allow everything" for that user-agent. This is how you create an explicit allow-all rule for a specific bot while blocking others:
User-agent: Googlebot Disallow: User-agent: BadBot Disallow: /
Trailing Slashes: When They Matter
Disallow: /admin/ blocks /admin/ and everything inside it: /admin/users, /admin/settings, /admin/login.
Disallow: /admin (without trailing slash) blocks /admin, /admin/, /admin-panel, /administration, /admin2 — anything that starts with those characters. This can accidentally block more than intended.
For directory blocking, always use a trailing slash: Disallow: /admin/. For exact file blocking, include the full path: Disallow: /secret-file.html.
The distinction matters when you have paths like /products and /products-archive — Disallow: /products blocks both; Disallow: /products/ only blocks /products/ and its children.
Sell Custom Apparel — We Handle Printing & Free ShippingUsing Wildcards: * and $ in Disallow Rules
Two special characters extend what Disallow can match:
* (asterisk) matches any sequence of characters. Useful for URL parameters:
Disallow: /*?*
This blocks any URL with a query string (?). Useful for blocking filter parameter duplicates en masse. Careful — this also blocks legitimate query parameters you might want indexed.
More targeted parameter blocking:
Disallow: /*?sort=* Disallow: /*?color=* Disallow: /*?page=*
$ (dollar sign) matches the end of the URL. Useful for blocking specific file types:
Disallow: /*.pdf$ Disallow: /*.json$
This blocks only URLs ending in .pdf or .json — not /pdf-guide/ or /json-api-docs/.
Using Allow to Create Exceptions
Allow overrides Disallow for more specific paths. Order within the block matters: the most specific rule wins. If two rules match with equal specificity, Allow takes precedence over Disallow.
Example: block all of /api/ except the public endpoint:
User-agent: * Disallow: /api/ Allow: /api/public/
Example: block everything except the homepage and blog:
User-agent: * Disallow: / Allow: /blog/ Allow: /
Note that Allow: / means the homepage specifically, not "allow everything." To allow everything, you use an empty Disallow: line, not Allow: /.
Real-World Pattern Examples
Block all session-ID URLs (a common e-commerce issue):
Disallow: /*?sessionid=* Disallow: /*?PHPSESSID=*
Block paginated results while keeping page 1:
Disallow: /*?page=*
Or using WordPress-style pagination:
Disallow: /page/
Block all images directory from crawling (saves crawl budget):
Disallow: /wp-content/uploads/
Block specific file extensions across the site:
Disallow: /*.xlsx$ Disallow: /*.csv$ Disallow: /*.log$
Allow Googlebot to access CSS and JS while blocking other bots:
User-agent: Googlebot Allow: /wp-content/themes/ Allow: /wp-content/plugins/ User-agent: * Disallow: /wp-content/
Try It Free — No Signup Required
Runs 100% in your browser. No data is collected, stored, or sent anywhere.
Open Free Robots.txt GeneratorFrequently Asked Questions
Does Disallow: / block everything including the homepage?
Yes. Disallow: / blocks everything on the domain. This is sometimes used on staging sites intentionally. On production sites it's a critical error that stops all indexing.
Can I use regex in robots.txt Disallow rules?
No. Robots.txt only supports * (any characters) and $ (end of URL) as special characters. Full regex is not supported. Some crawlers may handle regex-like patterns differently, but only rely on * and $ for cross-crawler compatibility.
Does Disallow: /admin/ block /admin (without trailing slash)?
No. /admin/ only blocks /admin/ and its children, not /admin without the slash. For both, add two rules: Disallow: /admin and Disallow: /admin/
How do I block URL parameters without blocking the base page?
Use wildcard patterns: Disallow: /*?color=* blocks any URL with color= in the query string while leaving the base page (/products/) unblocked.
Can I have multiple User-agent blocks for the same bot?
Technically yes, but it's confusing and behavior varies by parser. Best practice is one User-agent block per bot with all rules for that bot listed together.

