AI crawler visibility

AI Bot Access Checker

Check whether your website is open to search engines, AI search bots, and AI training crawlers. Scan robots.txt, crawler rules, sitemap signals, and get a safer recommended configuration.

Important note on Google-Extended

Google-Extended is a robots.txt product token, not a normal search crawler like Googlebot. Blocking Google-Extended does not block Google Search indexing or ranking, but it can limit whether Google may use the site's content for Gemini/Vertex AI related uses.

GPTBot for training control
OAI-SearchBot and ChatGPT-User
Googlebot and Bingbot

Vexifya fetches only public http/https URLs server-side with SSRF protections, redirect limits, timeouts, and response size caps.

Example result

Sample preview only, not a real scan.

Preview

Search Visibility

92/100

AI Search Visibility

80/100

Training Exposure

Medium

GooglebotAllowed
BingbotAllowed
OAI-SearchBotAllowed
GPTBotBlocked
Google-ExtendedBlocked

Crawler access guide

What is an AI bot access checker?

An AI bot access checker reads public robots.txt rules and basic site access signals to show whether search engines, AI answer bots, and AI training crawlers are allowed to fetch your pages.

GPTBot vs OAI-SearchBot

GPTBot is associated with improving and training OpenAI models. OAI-SearchBot is for discovery and surfacing content in ChatGPT search experiences. Publishers can allow OAI-SearchBot and ChatGPT-User while blocking GPTBot for training-control.

Google-Extended is different

Google-Extended is a robots.txt product token, not a normal search crawler like Googlebot. Blocking Google-Extended does not block Google Search indexing or ranking, but it can limit whether Google may use the site's content for Gemini/Vertex AI related uses.

Should you block AI training bots?

Blocking training bots can reduce training exposure, but it is a publisher policy choice rather than a universal best practice. The right setup depends on licensing, traffic goals, and content strategy.

Should you allow AI search bots?

If discovery in ChatGPT, Perplexity, and other answer surfaces matters, keep AI search bots allowed. For OpenAI, OAI-SearchBot handles search/discovery while ChatGPT-User handles user-triggered retrieval.

robots.txt limitations

robots.txt cannot force compliance. For stronger enforcement, combine it with server rules, WAF policies, Cloudflare bot controls, signed access, or authentication.

Cloudflare AI bot blocking note

Cloudflare can detect and block some automated traffic at the edge, which is stronger than publishing instructions in robots.txt. Use robots.txt for crawler policy and Cloudflare or WAF rules for enforcement.

Recommended setup for most publishers

For most public websites, the balanced setup is:

  • Allow Googlebot and Bingbot for normal search indexing.
  • Allow OAI-SearchBot and ChatGPT-User if AI search visibility matters.
  • Block GPTBot, CCBot, ClaudeBot, anthropic-ai, Bytespider, Meta-ExternalAgent, Applebot-Extended, Amazonbot, and other training-focused crawlers if the publisher does not want broad AI training use.
  • Use Cloudflare, WAF, or server rules if stronger enforcement is needed, because robots.txt is only an instruction file.

Related crawl analysis tool

After checking robots.txt access rules, use the Googlebot Log Analyzer to inspect real Googlebot, Bingbot, AI bot, and SEO crawler activity from server access logs.

FAQ

What is GPTBot?

GPTBot is OpenAI's crawler associated with improving and training generative AI foundation models. Publishers commonly block GPTBot when they want training-control while still allowing search crawlers. Blocking GPTBot is separate from allowing OAI-SearchBot for ChatGPT search visibility.

What is OAI-SearchBot?

OAI-SearchBot is OpenAI's search crawler for discovery and surfacing content in ChatGPT search experiences. If a public site wants a better chance of appearing in ChatGPT search answers, it should usually allow OAI-SearchBot. It is separate from GPTBot training-control.

What is ChatGPT-User?

ChatGPT-User is a user-triggered agent used when someone asks ChatGPT or a Custom GPT to access or retrieve a page. It is not the same as an automatic training crawler. Blocking it can make user-requested page retrieval less reliable.

What is Google-Extended?

Google-Extended is a robots.txt product token, not a normal search crawler like Googlebot. It lets publishers manage whether content Google crawls may be used for Gemini Apps and Vertex AI generative API related uses. It does not have a separate HTTP user-agent string like Googlebot.

Will blocking Google-Extended hurt my Google rankings?

No. Google says Google-Extended does not affect inclusion in Google Search and is not used as a Google Search ranking signal. Blocking Google-Extended should not block Googlebot from indexing pages. Keep Googlebot allowed if normal Google Search visibility matters.

Will blocking GPTBot remove my site from ChatGPT search?

Blocking GPTBot is not the same as blocking ChatGPT search. For ChatGPT search visibility, the more important bot is OAI-SearchBot. A common setup is to allow OAI-SearchBot and ChatGPT-User while blocking GPTBot for training-control.

Can robots.txt stop all AI scraping?

No. robots.txt is an instruction file for respectful crawlers, not an access-control system. It can communicate your crawler policy, but it cannot force every bot to comply. For stronger protection, use server rules, WAF rules, Cloudflare bot controls, or authentication.

Should I block AI training bots?

Many publishers block training-focused crawlers if they do not want broad AI training use of their content. Others allow them for reach, partnerships, or product reasons. The best choice depends on your content rights, business model, and visibility goals.

Should I allow AI search bots?

Allow AI search bots if visibility in AI answer and search experiences matters to your site. OAI-SearchBot and ChatGPT-User serve different purposes from GPTBot. Blocking all AI bots may reduce discovery in AI search surfaces.

Is Cloudflare better than robots.txt for blocking AI bots?

Cloudflare, WAF, and server rules can enforce blocks more strongly than robots.txt because they can reject requests at the edge or origin. robots.txt is still useful for publishing crawler policy. Many sites use both: robots.txt for instructions and Cloudflare or server controls for enforcement.