Web content safeguard: Cloudflare prevents artificial intelligence bots from unauthorized data harvesting
Cloudflare, a service that hosts approximately 20% of the web, has announced a new initiative that could reshape the landscape of web data access. The company is experimenting with a "pay-per-crawl" plan, allowing domain owners to set fees for AI companies to access their sites.
This move has been welcomed by major news publishers, including The New York Times, The Wall Street Journal, Reuters, and others such as the Associated Press, Time, The Atlantic, and Reddit. These publishers have collaborated with Cloudflare to block AI bots from their websites and support the pay-per-crawl model.
However, the pay-per-crawl experiment raises questions about how pricing tiers might be established in a pay-per-crawl market. Bill Gross, an entrepreneur and founder of AI startup ProRata, argues that AI bots are equivalent to shoplifters and should pay for their access. On the other hand, Shayne Longpre, a PhD candidate at MIT, believes that pushback against crawlers could threaten the transparency and openness of the Web, potentially shrinking its biodiversity.
Not all crawlers are AI bots; some enhance security, archive webpages, or index them for search engines. The concern is that the new policy could inadvertently block these beneficial crawlers. Cloudflare claims to be able to accommodate beneficial crawlers by allowing domain owners to selectively bypass payment for them.
The pay-per-crawl model is not a new concept. Domain owners can set a flat, per-request price across their entire site. Major publications might potentially charge different prices for crawls, with the New York Times possibly charging more than a local newspaper.
The new policy will shift the default to require site owners to actively allow AI bots. Previously, website owners using Cloudflare could allow or disallow AI bots when setting up a domain. Cloudflare's AI Labyrinth feature, designed to trap misbehaving bots, will continue to operate, spinning them around in circles to prevent excessive data collection.
Google's AI Overview feature, which displays AI-generated summaries of search results, has caused a significant drop in traffic referrals to publishers. With the new policy, site owners will have more control over who accesses their content and how it is used.
An analysis carried out by developer Robb Knight found that Perplexity, an AI crawler, ignores robots.txt files, despite Perplexity claiming otherwise. This issue, along with the potential strain on website bandwidth caused by some AI crawlers, has contributed to the push for a pay-per-crawl model.
The move is seen as a win for the publishing industry, as it could help mitigate the impact of AI on their traffic and revenue. As the debate continues, it remains to be seen how the pay-per-crawl model will evolve and what implications it will have for the future of the web.
Read also:
- Top-Notch Weed Killers for Fences in 2025: Efficient Boundary Management Solutions for a Clean Fence Line
- Jellyfish invade coastlines, forcing closure of nuclear power plant: jellyfish predicament
- Science building at the university reduces yearly energy expenses by $1.2 million, maintaining environmental safety ventilation standards.
- Whistleblower from the Social Security department, alleging improper handling of citizens' confidential data related to DOGE, exits position