nanog mailing list archives

Re: Captchas on Cloudflare-Proxied Sites


From: nanog--- via NANOG <nanog () lists nanog org>
Date: Sun, 06 Jul 2025 20:45:32 +0200

As far as I'm aware, or as far as anyone's ever made me aware when I asked, there remains zero evidence that the 
high-intensity high-anonymity bots some sites are seeing have anything to do with AI.

If they are AI, a court just ruled that AI scraping is fair use, so maybe you should offer them a zipped copy of your 
site and they won't have to scrape it.

The actual reason for CAPTCHAs is revenue. Site operators would like to give a zipped copy of their sites to OpenAI - 
for $100,000. And they want that to be the only way OpenAI can get a copy.


On 2 July 2025 12:50:28 pm GMT+02:00, niels=nanog--- via NANOG <nanog () lists nanog org> wrote:
* Constantine A. Murenin [Wed 02 Jul 2025, 05:23 CEST]:
But the bots are not a problem if you're doing proper caching and throttling.

Have you been following the news at all lately? Website operators are complaining left and right about the load from 
scrapers related to AI companies. They're seeing 10x, 100x the normal visitor load, with not just User-Agents but also 
source IP addresses masked to present as regular visitors. Captchas is unfortunately one of the more visible ways to 
address this, even if not perfect.

For example, 
https://arstechnica.com/ai/2025/03/devs-say-ai-crawlers-dominate-traffic-forcing-blocks-on-entire-countries/



      -- Niels.
_______________________________________________
NANOG mailing list https://lists.nanog.org/archives/list/nanog () lists nanog 
org/message/L5WNOGOAOZRJYR5BAEWSGOD7SPDKKX32/
_______________________________________________
NANOG mailing list 
https://lists.nanog.org/archives/list/nanog () lists nanog org/message/3SRPADJMZLXTD4U2EITRXGPFL4GK2VAR/


Current thread: