nanog mailing list archives
Re: Correctly dealing with bots and scrapers.
From: "Constantine A. Murenin via NANOG" <nanog () lists nanog org>
Date: Thu, 17 Jul 2025 00:18:54 -0500
Hi, Honestly, the best and safest way to combat this, is to ensure that it's very cheap and fast for you to generate the pages that the bots request, this way, you wouldn't really care if they request said pages or not. For example, a lot of software invalidly sets random/useless cookies for literally no reason. This prevents caching. What you could do, is, strip out all of the cookies both ways, and then simply cache the generic responses for the generic requests that your backend generates. This can be done with standard OSS nginx by clearing Cookie and Set-Cookie headers. If you really want to limit requests by the User-Agent, instead of by REMOTE_ADDR like it's normally done, the standard OSS nginx can do that, too, see http://nginx.org/r/limit_req_zone and nginx.org/r/$http_ for `$http_user_agent`, although you might inadvertently blacklist popular browsers this way. C. On Wed, 16 Jul 2025 at 11:49, Andrew Latham via NANOG <nanog () lists nanog org> wrote:
I just had an issue with a web-server where I had to block a /18 of a large scraper. I have some topics I could use some input on. 1. What tools or setups have people found most successful for dealing with bots/scrapers that do not respect robots.txt for example? 2. What tools for response rate limiting deal with bots/scrapers that cycle over a large variety of IPs with the exact same user agent? 3. Has anyone written or found a tool to concentrate IP addresses into networks for IPTABLES or NFT? (60% of IPs for network X in list so add network X and remove individual IP entries.) -- - Andrew "lathama" Latham -
_______________________________________________ NANOG mailing list https://lists.nanog.org/archives/list/nanog () lists nanog org/message/KN25F2NFYBNKUY36KIMKMQX767BHVOWU/
Current thread:
- Re[2]: Correctly dealing with bots and scrapers., (continued)
- Re[2]: Correctly dealing with bots and scrapers. Ryland Kremeier via NANOG (Jul 16)
- Re: Re[2]: Correctly dealing with bots and scrapers. Compton, Rich via NANOG (Jul 16)
- Re: Correctly dealing with bots and scrapers. Eric Kuhnke via NANOG (Jul 21)
- Re: Correctly dealing with bots and scrapers. William Herrin via NANOG (Jul 16)
- Re: Correctly dealing with bots and scrapers. Jay Acuna via NANOG (Jul 17)
- Re: Correctly dealing with bots and scrapers. Andrew Latham via NANOG (Jul 16)
- Re: Correctly dealing with bots and scrapers. Marco Moock via NANOG (Jul 16)
- Re: Correctly dealing with bots and scrapers. Constantine A. Murenin via NANOG (Jul 16)
- Re: Correctly dealing with bots and scrapers. Andrew Latham via NANOG (Jul 17)
- Re: Correctly dealing with bots and scrapers. Constantine A. Murenin via NANOG (Jul 17)
- Re: Correctly dealing with bots and scrapers. maillists--- via NANOG (Jul 18)
- Re: Correctly dealing with bots and scrapers. Andrew Latham via NANOG (Jul 21)
