Blog
Wild & Free Tools

Robots.txt Disallow Patterns: The Complete Guide

Last updated: April 2026 7 min read

Table of Contents

  1. Basic Disallow syntax
  2. Trailing slashes matter
  3. Wildcard patterns
  4. Allow overrides
  5. Common pattern examples
  6. Frequently Asked Questions

Most robots.txt guides show you a simple Disallow: /admin/ and call it done. But real sites have complex URL structures — filter parameters, paginated pages, session IDs, query strings. Getting Disallow patterns right is what separates a functional robots.txt from one that's blocking the wrong pages or missing the right ones.

Basic Disallow Syntax Rules

Every Disallow rule belongs inside a User-agent block. The User-agent must come first, then the Disallow lines beneath it:

User-agent: *
Disallow: /admin/
Disallow: /account/

The path after Disallow is case-sensitive. /Admin/ and /admin/ are treated as different paths. Match the actual URL exactly.

An empty Disallow line means "allow everything" for that user-agent. This is how you create an explicit allow-all rule for a specific bot while blocking others:

User-agent: Googlebot
Disallow:

User-agent: BadBot
Disallow: /

Trailing Slashes: When They Matter

Disallow: /admin/ blocks /admin/ and everything inside it: /admin/users, /admin/settings, /admin/login.

Disallow: /admin (without trailing slash) blocks /admin, /admin/, /admin-panel, /administration, /admin2 — anything that starts with those characters. This can accidentally block more than intended.

For directory blocking, always use a trailing slash: Disallow: /admin/. For exact file blocking, include the full path: Disallow: /secret-file.html.

The distinction matters when you have paths like /products and /products-archive — Disallow: /products blocks both; Disallow: /products/ only blocks /products/ and its children.

Sell Custom Apparel — We Handle Printing & Free Shipping

Using Wildcards: * and $ in Disallow Rules

Two special characters extend what Disallow can match:

* (asterisk) matches any sequence of characters. Useful for URL parameters:

Disallow: /*?*

This blocks any URL with a query string (?). Useful for blocking filter parameter duplicates en masse. Careful — this also blocks legitimate query parameters you might want indexed.

More targeted parameter blocking:

Disallow: /*?sort=*
Disallow: /*?color=*
Disallow: /*?page=*

$ (dollar sign) matches the end of the URL. Useful for blocking specific file types:

Disallow: /*.pdf$
Disallow: /*.json$

This blocks only URLs ending in .pdf or .json — not /pdf-guide/ or /json-api-docs/.

Using Allow to Create Exceptions

Allow overrides Disallow for more specific paths. Order within the block matters: the most specific rule wins. If two rules match with equal specificity, Allow takes precedence over Disallow.

Example: block all of /api/ except the public endpoint:

User-agent: *
Disallow: /api/
Allow: /api/public/

Example: block everything except the homepage and blog:

User-agent: *
Disallow: /
Allow: /blog/
Allow: /

Note that Allow: / means the homepage specifically, not "allow everything." To allow everything, you use an empty Disallow: line, not Allow: /.

Real-World Pattern Examples

Block all session-ID URLs (a common e-commerce issue):

Disallow: /*?sessionid=*
Disallow: /*?PHPSESSID=*

Block paginated results while keeping page 1:

Disallow: /*?page=*

Or using WordPress-style pagination:

Disallow: /page/

Block all images directory from crawling (saves crawl budget):

Disallow: /wp-content/uploads/

Block specific file extensions across the site:

Disallow: /*.xlsx$
Disallow: /*.csv$
Disallow: /*.log$

Allow Googlebot to access CSS and JS while blocking other bots:

User-agent: Googlebot
Allow: /wp-content/themes/
Allow: /wp-content/plugins/

User-agent: *
Disallow: /wp-content/

Try It Free — No Signup Required

Runs 100% in your browser. No data is collected, stored, or sent anywhere.

Open Free Robots.txt Generator

Frequently Asked Questions

Does Disallow: / block everything including the homepage?

Yes. Disallow: / blocks everything on the domain. This is sometimes used on staging sites intentionally. On production sites it's a critical error that stops all indexing.

Can I use regex in robots.txt Disallow rules?

No. Robots.txt only supports * (any characters) and $ (end of URL) as special characters. Full regex is not supported. Some crawlers may handle regex-like patterns differently, but only rely on * and $ for cross-crawler compatibility.

Does Disallow: /admin/ block /admin (without trailing slash)?

No. /admin/ only blocks /admin/ and its children, not /admin without the slash. For both, add two rules: Disallow: /admin and Disallow: /admin/

How do I block URL parameters without blocking the base page?

Use wildcard patterns: Disallow: /*?color=* blocks any URL with color= in the query string while leaving the base page (/products/) unblocked.

Can I have multiple User-agent blocks for the same bot?

Technically yes, but it's confusing and behavior varies by parser. Best practice is one User-agent block per bot with all rules for that bot listed together.

Launch Your Own Clothing Brand — No Inventory, No Risk