Question 1

Should I block AI crawlers from my website?

Accepted Answer

No universal answer. Block them if your content is licensed, IP-sensitive, or your business model depends on direct site visits. Allow them if AI visibility is part of your distribution — being cited by ChatGPT or Perplexity is increasingly how customers find SMEs. Most publishers split the difference: block training crawlers (GPTBot, ClaudeBot, CCBot) and allow live-retrieval crawlers (OAI-SearchBot, PerplexityBot when used for retrieval).

Question 2

How do I opt out of ChatGPT training?

Accepted Answer

Add two lines to your /robots.txt: User-agent: GPTBot followed by Disallow: /. OpenAI honours this — they published the GPTBot identifier specifically for this purpose. Your site stays fully visible in ChatGPT's web search mode (which uses a different crawler) and in classical Google Search.

Question 3

Does blocking Google-Extended affect Google Search rankings?

Accepted Answer

No. Google explicitly designed Google-Extended as an opt-out token that controls Gemini training and AI features only — classical Google Search uses the regular Googlebot directive, which is unaffected. This is the cleanest way to opt out of Google AI training without losing search visibility.

Question 4

What is llms.txt and do I need one?

Accepted Answer

/llms.txt is an emerging convention, proposed in late 2024, for sites that want to be readable by LLMs. Think of it as a sitemap for AI assistants — a markdown index of your most important content with descriptions, optimised for LLM context windows. Not required, but having one is a positive signal you have thought about AI discoverability.

Question 5

Do AI crawlers actually honour robots.txt?

Accepted Answer

The major ones do: GPTBot, ClaudeBot, Google-Extended, Applebot-Extended, Meta-ExternalAgent and CCBot all publish identifiers and honour the standard. The exception has been Perplexity — multiple investigations in 2024 found their crawler ignoring directives. If you specifically want to block Perplexity, plan for IP-based blocking in addition to robots.txt as a backstop.

Question 6

Is blocking AI crawlers legally required?

Accepted Answer

No UK statute requires it. UK GDPR Article 4 may apply if your content includes personal data being scraped at scale, but that is a fact-specific legal question. The EU AI Act's text and data mining (TDM) opt-out provisions under Article 4(3) of the Copyright Directive apply to EU operators — many UK publishers signal an opt-out via robots.txt or ai.txt as good practice even though enforcement is unclear. None of this is legal advice.

Is your website opted in to AI training?

What we
read

/robots.txt

/ai.txt

/llms.txt

No "right" answer. Just clarity.

Who's
asking

GPTBot

ClaudeBot

Google-Extended

PerplexityBot

CCBot (Common Crawl)

Applebot-Extended

Meta-ExternalAgent & Bytespider

Common
questions

Should I block AI crawlers from my website?

How do I opt out of ChatGPT training?

Does blocking Google-Extended affect Google Search rankings?

What is llms.txt and do I need one?

Do AI crawlers actually honour robots.txt?

Is blocking AI crawlers legally required?

Run the scan. See your stance.

Is your website opted in to AI training?

What weread

/robots.txt

/ai.txt

/llms.txt

No "right" answer. Just clarity.

Who'sasking

GPTBot

ClaudeBot

Google-Extended

PerplexityBot

CCBot (Common Crawl)

Applebot-Extended

Meta-ExternalAgent & Bytespider

Commonquestions

Should I block AI crawlers from my website?

How do I opt out of ChatGPT training?

Does blocking Google-Extended affect Google Search rankings?

What is llms.txt and do I need one?

Do AI crawlers actually honour robots.txt?

Is blocking AI crawlers legally required?

Run the scan. See your stance.

What we
read

Who's
asking

Common
questions