Robots.txt Disallow Patterns: The Complete Guide

Last updated: April 2026 7 min read

Basic Disallow syntax
Trailing slashes matter
Wildcard patterns
Allow overrides
Common pattern examples
Frequently Asked Questions

Most robots.txt guides show you a simple Disallow: /admin/ and call it done. But real sites have complex URL structures — filter parameters, paginated pages, session IDs, query strings. Getting Disallow patterns right is what separates a functional robots.txt from one that's blocking the wrong pages or missing the right ones.

Basic Disallow Syntax Rules

Every Disallow rule belongs inside a User-agent block. The User-agent must come first, then the Disallow lines beneath it:

User-agent: *
Disallow: /admin/
Disallow: /account/

The path after Disallow is case-sensitive. /Admin/ and /admin/ are treated as different paths. Match the actual URL exactly.

An empty Disallow line means "allow everything" for that user-agent. This is how you create an explicit allow-all rule for a specific bot while blocking others:

User-agent: Googlebot
Disallow:

User-agent: BadBot
Disallow: /

Trailing Slashes: When They Matter

Disallow: /admin/ blocks /admin/ and everything inside it: /admin/users, /admin/settings, /admin/login.

Disallow: /admin (without trailing slash) blocks /admin, /admin/, /admin-panel, /administration, /admin2 — anything that starts with those characters. This can accidentally block more than intended.

For directory blocking, always use a trailing slash: Disallow: /admin/. For exact file blocking, include the full path: Disallow: /secret-file.html.

The distinction matters when you have paths like /products and /products-archive — Disallow: /products blocks both; Disallow: /products/ only blocks /products/ and its children.

Using Wildcards: * and $ in Disallow Rules

Two special characters extend what Disallow can match:

* (asterisk) matches any sequence of characters. Useful for URL parameters:

Disallow: /*?*

This blocks any URL with a query string (?). Useful for blocking filter parameter duplicates en masse. Careful — this also blocks legitimate query parameters you might want indexed.

More targeted parameter blocking:

Disallow: /*?sort=*
Disallow: /*?color=*
Disallow: /*?page=*

$ (dollar sign) matches the end of the URL. Useful for blocking specific file types:

Disallow: /*.pdf$
Disallow: /*.json$

This blocks only URLs ending in .pdf or .json — not /pdf-guide/ or /json-api-docs/.

Using Allow to Create Exceptions

Allow overrides Disallow for more specific paths. Order within the block matters: the most specific rule wins. If two rules match with equal specificity, Allow takes precedence over Disallow.

Example: block all of /api/ except the public endpoint:

User-agent: *
Disallow: /api/
Allow: /api/public/

Example: block everything except the homepage and blog:

User-agent: *
Disallow: /
Allow: /blog/
Allow: /

Note that Allow: / means the homepage specifically, not "allow everything." To allow everything, you use an empty Disallow: line, not Allow: /.

Real-World Pattern Examples

Block all session-ID URLs (a common e-commerce issue):

Disallow: /*?sessionid=*
Disallow: /*?PHPSESSID=*

Block paginated results while keeping page 1:

Disallow: /*?page=*

Or using WordPress-style pagination:

Disallow: /page/

Block all images directory from crawling (saves crawl budget):

Disallow: /wp-content/uploads/

Block specific file extensions across the site:

Disallow: /*.xlsx$
Disallow: /*.csv$
Disallow: /*.log$

Allow Googlebot to access CSS and JS while blocking other bots:

User-agent: Googlebot
Allow: /wp-content/themes/
Allow: /wp-content/plugins/

User-agent: *
Disallow: /wp-content/

Try It Free — No Signup Required

Runs 100% in your browser. No data is collected, stored, or sent anywhere.

Open Free Robots.txt Generator

Frequently Asked Questions

Does Disallow: / block everything including the homepage?

Yes. Disallow: / blocks everything on the domain. This is sometimes used on staging sites intentionally. On production sites it's a critical error that stops all indexing.

Can I use regex in robots.txt Disallow rules?

No. Robots.txt only supports * (any characters) and $ (end of URL) as special characters. Full regex is not supported. Some crawlers may handle regex-like patterns differently, but only rely on * and $ for cross-crawler compatibility.

Does Disallow: /admin/ block /admin (without trailing slash)?

No. /admin/ only blocks /admin/ and its children, not /admin without the slash. For both, add two rules: Disallow: /admin and Disallow: /admin/

How do I block URL parameters without blocking the base page?

Use wildcard patterns: Disallow: /*?color=* blocks any URL with color= in the query string while leaving the base page (/products/) unblocked.

Can I have multiple User-agent blocks for the same bot?

Technically yes, but it's confusing and behavior varies by parser. Best practice is one User-agent block per bot with all rules for that bot listed together.

Robots.txt Disallow Patterns: The Complete Guide

Table of Contents