Pure Ranker
Technical GEOJun 9, 20266 min read

Are You Accidentally Blocking ChatGPT and Claude?

Many sites block ChatGPT and Claude by accident. A plain-English guide to AI crawlers, robots.txt, and making access a deliberate choice.

Are You Accidentally Blocking ChatGPT and Claude?

You can write the most quotable content on the web, but if your robots.txt or headers slam the door on AI crawlers, none of it matters. Many sites block AI bots by accident — through a copied config, an aggressive security plugin, or a CDN default. Let's make sure yours isn't one of them.

Meet the AI crawlers

CrawlerOperated byPurpose
GPTBotOpenAITrains/powers ChatGPT
ChatGPT-UserOpenAILive fetches during a chat
OAI-SearchBotOpenAIChatGPT search index
ClaudeBotAnthropicTrains/powers Claude
anthropic-aiAnthropicClaude data collection
PerplexityBotPerplexityPerplexity answer engine
Google-ExtendedGoogleGemini training opt-in/out
CCBotCommon CrawlOpen dataset many models train on
cohere-aiCohereCohere models

The permissive baseline

If you want AI visibility (most businesses do), this is a healthy robots.txt:

User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: CCBot
Allow: /

User-agent: cohere-ai
Allow: /

User-agent: *
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

Where accidental blocks hide

robots.txt is only the first place to look. AI access can also be cut off by:

  • Meta tags: <meta name="robots" content="noai, noimageai"> or a blanket noindex.
  • HTTP headers: X-Robots-Tag: noai set at the server or CDN.
  • CDN/WAF bot rules: Cloudflare, Akamai, and others often have "block AI bots" toggles that may be on by default.
  • Login/paywalls: content behind auth is invisible to crawlers.

How to audit access

Check what a specific bot receives by spoofing its user agent:

# What does GPTBot get back?
curl -A "GPTBot" -I https://yoursite.com/

# Check for X-Robots-Tag in the response headers
curl -I https://yoursite.com/ | grep -i "x-robots-tag"

Then view your live robots.txt and search the page source for noai / noindex.

A decision, not a default

Allowing AI crawlers is a business choice with two sides:

  • Allow: maximise visibility and citations. Right for most marketing, SaaS, and content sites.
  • Restrict: protect proprietary or premium content you don't want in training data.

The mistake isn't choosing to block — it's blocking by accident and wondering why you never show up in AI answers. Make it deliberate.

Next up: once crawlers are in, schema markup tells them exactly what they're looking at.