Technical GEOJun 9, 20266 min read

Are You Accidentally Blocking ChatGPT and Claude?

Many sites block ChatGPT and Claude by accident. A plain-English guide to AI crawlers, robots.txt, and making access a deliberate choice.

You can write the most quotable content on the web, but if your robots.txt or headers slam the door on AI crawlers, none of it matters. Many sites block AI bots by accident — through a copied config, an aggressive security plugin, or a CDN default. Let's make sure yours isn't one of them.

Meet the AI crawlers

Crawler	Operated by	Purpose
GPTBot	OpenAI	Trains/powers ChatGPT
ChatGPT-User	OpenAI	Live fetches during a chat
OAI-SearchBot	OpenAI	ChatGPT search index
ClaudeBot	Anthropic	Trains/powers Claude
anthropic-ai	Anthropic	Claude data collection
PerplexityBot	Perplexity	Perplexity answer engine
Google-Extended	Google	Gemini training opt-in/out
CCBot	Common Crawl	Open dataset many models train on
cohere-ai	Cohere	Cohere models

The permissive baseline

If you want AI visibility (most businesses do), this is a healthy robots.txt:

User-agent: GPTBot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: ClaudeBot
Allow: /

User-agent: anthropic-ai
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Google-Extended
Allow: /

User-agent: CCBot
Allow: /

User-agent: cohere-ai
Allow: /

User-agent: *
Allow: /

Sitemap: https://yoursite.com/sitemap.xml

Where accidental blocks hide

robots.txt is only the first place to look. AI access can also be cut off by:

Meta tags: <meta name="robots" content="noai, noimageai"> or a blanket noindex.
HTTP headers: X-Robots-Tag: noai set at the server or CDN.
CDN/WAF bot rules: Cloudflare, Akamai, and others often have "block AI bots" toggles that may be on by default.
Login/paywalls: content behind auth is invisible to crawlers.

How to audit access

Check what a specific bot receives by spoofing its user agent:

# What does GPTBot get back?
curl -A "GPTBot" -I https://yoursite.com/

# Check for X-Robots-Tag in the response headers
curl -I https://yoursite.com/ | grep -i "x-robots-tag"

Then view your live robots.txt and search the page source for noai / noindex.

A decision, not a default

Allowing AI crawlers is a business choice with two sides:

Allow: maximise visibility and citations. Right for most marketing, SaaS, and content sites.
Restrict: protect proprietary or premium content you don't want in training data.

The mistake isn't choosing to block — it's blocking by accident and wondering why you never show up in AI answers. Make it deliberate.

Next up: once crawlers are in, schema markup tells them exactly what they're looking at.

All Articles