Cloudflare vs. Perplexity: The Battle Over Stealth Crawling

Tech Brief
Aug 5, 2025
4 min read

Updated: Oct 6, 2025

In early August 2025, Cloudflare accused Perplexity AI of using “stealth crawling” to access web content that had explicitly blocked its bots via robots.txt and firewall rules. According to Cloudflare, Perplexity allegedly disguised its crawlers as regular Chrome browsers, rotated IP addresses and ASNs, and continued scraping pages in violation of publisher preferences. As a result, Cloudflare removed Perplexity from its Verified Bots program and deployed new technical measures to block the activity. Perplexity dismissed the accusations as a “publicity stunt,” but the incident has reignited a larger debate about how AI browsers, answer engines, and intelligent agents interact with the open web — and what that means for SEO, publishers, and traffic in the next year.

Cloudflare's Claims and the Evidence Presented

Cloudflare says its systems observed Perplexity’s activity change after sites blocked its declared bots. Instead of stopping, traffic began arriving with user agents mimicking Google Chrome on macOS and coming from infrastructure not tied to Perplexity’s published IP ranges. Cloudflare claims this traffic was hitting tens of thousands of domains and generating millions of requests per day.

To confirm its suspicions, Cloudflare ran controlled tests. They created “honey pot” pages that were hidden from the public, disallowed in robots.txt, and blocked by WAF rules. Despite this, Cloudflare says content from those pages appeared in Perplexity’s responses — suggesting that the system still retrieved the material through undeclared crawlers.

Perplexity’s Response and the Loss of Verified Status

Perplexity has rejected Cloudflare’s narrative, calling it inaccurate and a marketing ploy. However, the consequences are tangible: being delisted from the Verified Bots program means that Perplexity loses automatic trust from Cloudflare-protected sites. Without that status, its crawlers may face more aggressive rate limits, CAPTCHAs, or outright blocking unless the company changes its approach or re-negotiates access.

Perplexity’s public documentation claims it respects robots.txt and offers publishers tags to opt out. However, it also states that it may still index certain non-content elements, such as titles or metadata, even when full crawling is blocked. The gap between stated policy and alleged practice is at the heart of the dispute.

The Role of Robots.txt in the Dispute

The robots.txt file is a voluntary convention, not an access-control mechanism. It relies on good-faith compliance from crawlers. While most major search engines honor it, ignoring it is not technically illegal in most jurisdictions. For publishers, this means that a bot that disregards the file can still collect data unless stronger technical or contractual measures are in place.

Cloudflare’s move to block Perplexity at the network level reflects a growing shift: enforcement must happen beyond robots.txt — either through verified bot authentication, legal agreements, or both.

The Larger Backstory: Publishers vs. AI Answer Engines

Perplexity has faced criticism before. In 2024, it was accused of paraphrasing or directly summarizing news articles without proper attribution or traffic referrals, even when sites had blocked its crawlers. The broader issue is that AI answer engines and “smart browsers” can serve rich, self-contained responses without sending users to the original sources. This undermines the traditional “crawl-for-traffic” trade-off that powered the web’s economics for decades.

Cloudflare, for its part, has been vocal about changing that bargain. It recently proposed a “Pay-Per-Crawl” model, where AI agents either compensate publishers for access, respect a block, or negotiate terms. It has also introduced Web Bot Authentication, which uses cryptographic signatures to verify that a bot is who it claims to be — with OpenAI cited as a compliant example.

Short-Term Impacts (Next 1–3 Months)

Reduced access for Perplexity to sites protected by Cloudflare unless publishers explicitly allow it.
More aggressive blocking by publishers, combining robots.txt with firewall rules, verified bot lists, and IP filtering.
Minimal immediate SEO gains for blocked sites, since the traffic drain is more about zero-click AI consumption than crawling alone.

Medium-Term Scenarios (3–12 Months)

1. “Verified Web” Adoption

More AI companies adopt bot authentication and strict compliance with publisher settings. Some publishers move toward licensing agreements, trading controlled access for compensation or branding.

2. Arms Race Continues

Non-compliant bots keep rotating IPs and disguising themselves as browsers. Platforms respond with automated detection, signature-based blocking, and potentially legal actions.

3. Regulation & Standardization

Governments or industry groups introduce enforceable standards for AI crawling and training. This could separate compliant, licensed agents from gray-market scrapers.

Practical Takeaways for Publishers and SEO Teams

Do not rely solely on robots.txt — pair it with stronger measures like bot verification, WAF rules, and IP filtering.
Set a clear AI policy: decide whether to allow, block, or monetize access.
Monitor AI referrals separately: if you do allow AI bots, track their traffic and conversions to assess ROI.
Review your terms of service: explicitly state allowed and prohibited uses of your content.

The Bottom Line

The Cloudflare –Perplexity clash is a microcosm of a bigger transformation: the collision between open web norms and AI-native consumption. Over the next year, expect tighter bot identity verification, more publisher control, and the first real tests of AI crawling payment models. The players that balance compliance, transparency, and value exchange will define the new rules of the game — and the publishers that adapt early will be in the best position to protect both their rights and their revenue.

Conclusion: The Future of Crawling and AI

As the landscape of web crawling evolves, the implications for SEO and content publishers are profound. The ongoing battle between Cloudflare and Perplexity highlights the need for clear policies and robust technical measures. Publishers must stay informed and proactive in protecting their content. Ultimately, the future will depend on how well all parties can navigate these challenges and find a balance that respects both innovation and the rights of content creators.

In this changing environment, it is crucial for publishers to adapt and embrace new strategies. The phrase **“stealth crawling”** will likely become a key term in discussions about AI and web access. As we look ahead, the ability to effectively manage AI interactions will be essential for maintaining a healthy online ecosystem.