Are You Accidentally Blocking ChatGPT and Claude?
Many sites block ChatGPT and Claude by accident. A plain-English guide to AI crawlers, robots.txt, and making access a deliberate choice.

You can write the most quotable content on the web, but if your robots.txt or headers slam the door on AI crawlers, none of it matters. Many sites block AI bots by accident — through a copied config, an aggressive security plugin, or a CDN default. Let's make sure yours isn't one of them.
Meet the AI crawlers
| Crawler | Operated by | Purpose |
|---|---|---|
| GPTBot | OpenAI | Trains/powers ChatGPT |
| ChatGPT-User | OpenAI | Live fetches during a chat |
| OAI-SearchBot | OpenAI | ChatGPT search index |
| ClaudeBot | Anthropic | Trains/powers Claude |
| anthropic-ai | Anthropic | Claude data collection |
| PerplexityBot | Perplexity | Perplexity answer engine |
| Google-Extended | Gemini training opt-in/out | |
| CCBot | Common Crawl | Open dataset many models train on |
| cohere-ai | Cohere | Cohere models |
The permissive baseline
If you want AI visibility (most businesses do), this is a healthy robots.txt:
User-agent: GPTBot
Allow: /
User-agent: ChatGPT-User
Allow: /
User-agent: OAI-SearchBot
Allow: /
User-agent: ClaudeBot
Allow: /
User-agent: anthropic-ai
Allow: /
User-agent: PerplexityBot
Allow: /
User-agent: Google-Extended
Allow: /
User-agent: CCBot
Allow: /
User-agent: cohere-ai
Allow: /
User-agent: *
Allow: /
Sitemap: https://yoursite.com/sitemap.xml
Where accidental blocks hide
robots.txt is only the first place to look. AI access can also be cut off by:
- Meta tags:
<meta name="robots" content="noai, noimageai">or a blanketnoindex. - HTTP headers:
X-Robots-Tag: noaiset at the server or CDN. - CDN/WAF bot rules: Cloudflare, Akamai, and others often have "block AI bots" toggles that may be on by default.
- Login/paywalls: content behind auth is invisible to crawlers.
How to audit access
Check what a specific bot receives by spoofing its user agent:
# What does GPTBot get back?
curl -A "GPTBot" -I https://yoursite.com/
# Check for X-Robots-Tag in the response headers
curl -I https://yoursite.com/ | grep -i "x-robots-tag"
Then view your live robots.txt and search the page source for noai / noindex.
A decision, not a default
Allowing AI crawlers is a business choice with two sides:
- Allow: maximise visibility and citations. Right for most marketing, SaaS, and content sites.
- Restrict: protect proprietary or premium content you don't want in training data.
The mistake isn't choosing to block — it's blocking by accident and wondering why you never show up in AI answers. Make it deliberate.
Next up: once crawlers are in, schema markup tells them exactly what they're looking at.
