highDistribution Readiness

AI Bot Policy (robots.txt)

An explicit AI bot policy in robots.txt tells crawlers what your intent is. Without it, AI bots inherit whatever your wildcard rule says — and that's usually wrong if you haven't thought about it. SaaSalyst checks robots.txt for explicit User-agent groups for the 6 major AI crawlers.

What SaaSalyst Checks

SaaSalyst parses robots.txt per RFC 9309 and looks for explicit User-agent groups for: GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended, and meta-externalagent. The check passes when at least 3 of the 6 have explicit groups. It fails when a wildcard Disallow: / blocks AI bots without any explicit override (likely accidental). It warns when no explicit AI bot policy exists at all.

Why This Matters

Most robots.txt files were written before AI crawlers existed, so they fall into one of three categories: (1) wildcard Allow:/ — AI crawlers can read everything, fine but unintentional; (2) wildcard Disallow:/ — AI crawlers cannot read anything, almost always accidental; (3) selective rules for googlebot only — AI crawlers inherit defaults.

The canonical UA list (github.com/ai-robots-txt/ai.robots.txt) covers the 6 bots that matter. An explicit policy avoids the failure mode where you intended to allow AI access but a wildcard rule silently blocks them.

SaaSalyst flags accidental wildcard blocks as a hard fail (high severity). Silent inheritance — no policy at all — is a soft warn: the inherited behavior is likely fine, but explicit declaration removes ambiguity for future maintainers and makes intent legible to AI crawlers.

How to Fix It

  1. Add explicit User-agent groups for the 6 major AI crawlers (GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Applebot-Extended, meta-externalagent) with the Allow / Disallow rules that match your intent.
  2. For most public-facing SaaS products, allow AI crawlers: each block becomes User-agent: GPTBot \n Allow: / (and similarly for the others).
  3. If you want to opt out of training data only, allow OAI-SearchBot but disallow GPTBot — these are separate crawlers.
  4. Reference the canonical UA list at github.com/ai-robots-txt/ai.robots.txt — it tracks new AI crawlers as they appear.

Frequently Asked Questions

What's the difference between this and the OAI-SearchBot check?

OAI-SearchBot is one specific crawler. This SaaSalyst check looks at the broader AI bot policy: are at least 3 of the 6 major AI crawlers explicitly addressed? Both checks can fire independently — passing one does not pass the other.

Why high severity for an accidental wildcard?

Because the consequence is silent — AI crawlers fail to index your site and you don't see the error in any dashboard. SaaSalyst rates it high so it shows up on the report and prompts a fix before AI traffic vanishes.

Why only warn (not fail) when no policy exists?

Silent inheritance is usually benign — wildcard Allow:/ lets AI crawlers in, and the absence of explicit policy is just under-specification, not active blocking. SaaSalyst uses warning level to surface it as something to address without penalizing the score the way an active block would.

Check Your SaaS Now | Free

SaaSalyst scans your website in 30 seconds and checks for AI Bot Policy (robots.txt) along with 101+ other business readiness signals.

Scan Your App

Related Checks SaaSalyst Runs