Blog
Custom Print on Demand Apparel — Free Storefront for Your Business
Wild & Free Tools

How to Block AI Bots with Robots.txt — GPTBot, CCBot & Every AI Crawler (2026)

Last updated: April 20268 min readSEO Tools

AI companies are scraping the open web to train their models, and your content is likely in the training data. Robots.txt is the standard way to tell AI crawlers to stay away. Here is every AI bot user-agent you should know about, with the exact rules to block them.

Every AI Bot User-Agent You Should Know

Bot NameCompanyPurposeUser-Agent String
GPTBotOpenAITraining data for ChatGPT and GPT modelsGPTBot
ChatGPT-UserOpenAIChatGPT browsing feature (retrieval, not training)ChatGPT-User
CCBotCommon CrawlWeb archiving — datasets widely used for AI trainingCCBot
Google-ExtendedGoogleTraining data for Gemini AI modelsGoogle-Extended
anthropic-aiAnthropicTraining data for Claude modelsanthropic-ai
ClaudeBotAnthropicClaude web browsing and retrievalClaudeBot
BytespiderByteDanceTraining data for TikTok AI and ByteDance modelsBytespider
FacebookBotMetaTraining data for Meta AI modelsFacebookBot
PerplexityBotPerplexityAI search engine crawling and retrievalPerplexityBot
AmazonbotAmazonTraining data for Alexa and Amazon AI productsAmazonbot
Cohere-aiCohereTraining data for Cohere language modelscohere-ai
Applebot-ExtendedAppleTraining data for Apple Intelligence featuresApplebot-Extended

Block All AI Bots — Copy-Paste Rules

Add these rules to your robots.txt file. Each block targets one AI crawler:

# Block AI training bots
User-agent: GPTBot
Disallow: /

User-agent: ChatGPT-User
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: PerplexityBot
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: Applebot-Extended
Disallow: /

The Honest Caveat: Robots.txt Is a Suggestion

This is the part most guides skip. Robots.txt is a voluntary protocol. There is no technical mechanism in robots.txt that physically prevents a bot from accessing your pages. It works like a "staff only" sign — polite visitors respect it, but there is no lock on the door.

Robots.txt is necessary but not sufficient. For stronger protection, consider server-side rate limiting, bot detection services, and monitoring your access logs for unusual crawling patterns.

When to Block vs. When to Allow

ScenarioRecommendationWhy
You publish original content and want to protect copyrightBlock all AI training botsPrevents your content from entering training datasets
You want to appear in AI-generated answersAllow retrieval bots (ChatGPT-User, PerplexityBot)These bots fetch content to cite in answers — blocking them removes you from AI search
You want zero AI involvementBlock everythingMaximum protection, but you disappear from AI-powered search entirely
You run an e-commerce storeBlock training bots, allow retrieval botsProduct descriptions in training data help no one; appearing in AI shopping answers helps you
You have a news or media siteBlock training bots at minimumYour journalism should not train competing AI summaries for free
You want to appear in Google AI OverviewsKeep Googlebot and Google-Extended allowedGoogle AI Overviews pull from indexed content; blocking may remove you from these features

Training Bots vs. Retrieval Bots

There is an important distinction most people miss:

Many publishers block training bots but allow retrieval bots. This protects intellectual property while maintaining visibility in AI-powered search results. It is the most balanced approach for most sites.

How to Check Your Current Robots.txt

  1. Visit yoursite.com/robots.txt in your browser
  2. If you see a file, check whether any AI bot user-agents are already listed
  3. If you see a 404, you have no robots.txt — crawlers can access everything
  4. Use our Robots.txt Generator to create or update your file with the AI bot rules included

Related Tools

Generate robots.txt rules that block AI scrapers — all major bots covered.

Open Robots.txt Generator
Launch Your Own Clothing Brand — No Inventory, No Risk