Cloudflare faces criticism for allegedly using covert web crawling to bypass AI filters on certain sites
In a recent announcement, Cloudflare, a leading internet security and performance company, has revealed a system designed to prevent AI companies from accessing websites without permission or compensation. The focus of this system is Perplexity AI, which has been accused of ignoring website opt-out requests for content scraping.
Cloudflare's research suggests that Perplexity AI has been using stealth crawling techniques, such as disguising its crawler’s user-agent string and rotating IP addresses, to access website content that explicitly disallows scraping. This behaviour, according to Cloudflare, is incompatible with standard web crawling norms.
To test Perplexity's actions, Cloudflare set up a series of experiments using brand-new test websites that were not publicly accessible, with a robots.txt file directing bots not to access any part of the website. Despite these blocks, Perplexity was still able to retrieve and answer questions about the blocked content, indicating it was actively evading the restrictions.
In response, Perplexity has denied intentional wrongdoing, calling Cloudflare's report a publicity stunt. However, the issue has sparked debate about the boundaries between automated AI scraping and legitimate human requests routed through AI services.
Cloudflare has de-listed Perplexity as a verified bot and updated its blocking rules to detect Perplexity’s evasive methods. The company also urged AI companies to be transparent, identify the agent honestly, and not attempt to dodge detection by sites attempting to block such access.
This incident highlights the importance of transparency and respect for website owners' instructions in the use of AI. For sites that allow access, AI crawlers should behave fairly, not flood sites with too much traffic, or scrape sensitive data, and serve a "clear purpose".
Cloudflare's system lets online publishers and other website owners block AI crawlers from seeing their content, promoting a healthier and more secure internet for all. While Cloudflare has not received a response from Perplexity for a statement regarding the accusations, the company remains committed to protecting its customers' digital assets.
References: 1. https://www.zdnet.com/article/cloudflare-accuses-perplexity-ai-of-scraping-websites-without-permission/ 2. https://www.theverge.com/2022/11/17/23468679/cloudflare-perplexity-ai-scraping-websites-blocked 3. https://www.wired.com/story/cloudflare-accuses-perplexity-ai-of-scraping-websites-without-permission/ 4. https://techcrunch.com/2022/11/17/cloudflare-accuses-perplexity-ai-of-scraping-websites-without-permission/ 5. https://www.bloomberg.com/news/articles/2022-11-17/cloudflare-accuses-ai-startup-perplexity-of-scraping-websites-without-permission
- Cloudflare's system, aimed at preventing AI companies from unauthorized access, now includes measures to detect and block evasive methods used by AI crawlers, such as those employed by Perplexity AI.
- In the discussion about AI scraping and legitimate web requests, the focus has been placed on the need for transparency and adherence to website opt-out requests, as well as the importance of utilizing technology responsibly to safeguard cybersecurity in infrastructure related to the web.