nanog mailing list archives

Re: Captchas on Cloudflare-Proxied Sites


From: "Constantine A. Murenin via NANOG" <nanog () lists nanog org>
Date: Wed, 2 Jul 2025 17:45:17 -0500

On Wed, 2 Jul 2025 at 14:38, William Kern via NANOG
<nanog () lists nanog org> wrote:


On 7/1/25 8:22 PM, Constantine A. Murenin via NANOG wrote:
But the bots are not a problem if you're doing proper caching and throttling.

Not all site traffic is cacheable or can be farmed out to a CDN.

That's just an excuse for inadequate planning and misplaced priorities.

If you start with the requirement that it'd all be cacheable, then
EVERYTHING can be cached, especially for the ecommerce and the
catalogue stuff.

OSS nginx is free and relatively easy to use, with excellent
documentation, and it offers superb caching functionality.  You don't
need an external CDN to do the caching.

You can even cache search results, especially for the non-logged
users.  Why would you NOT?  If, to quote arstechnica, "a GitLab link
is shared in a chat room", why would you want ANYONE to wait an extra
millisecond, let alone "having to wait around two minutes" for Anubis
proof-of-work, to access the result, if the result was already
computed and known, because it was already assembled for the person
who posting the link in the first place?

These things could even be cached in the app itself, and even shared
between all logged and non-logged users, if performance and web scale
is paramount.  Else, it can be architectected to be cachable with
nginx.


Dynamic (especially per-session) requests (think ecommerce) can't be cached.

Putting an item into the shopping cart is typically one of the more
resource driven events.

We have seen bots that will select the buy button and put items into the
cart, possibly to see

any discounts given. You end up with hundreds of active 'junk' cart
sessions on a small site

that was not designed for that much traffic.

Why is the simple act of placing an item in a shopping cart a
resource-driven event?

This can literally be done on the front-end without any server
requests at all, let alone resource-driven ones.

If you DO store an expensive session on the server for this, instead
of in the browser, then you also likely expire said carts even for
users who intended to return and complete the purchase.  Does the
owner know?

Yes, it's more work to have a separate cookie cart for anonymous
users, but if that's a business requirement, why not?  This way, even
if someone comes back many months later, if they've never cleared the
cookies, their cart will still be there, waiting for them, at zero
cost to your shopping cart database.  Isn't that how it should be?

Stores that empty your cart in 3 days, or which require captchas for
basic product viewing, are the best example of misplaced priorities.
I usually click the X button before they can complete their captcha.
And won't bother adding anytihng to the shopping cart again if the
store is known for data loss.



Forcing the bot (or a legit customer) to create yet another login to
create a cart can help

but that generates push back from the store owner. The owners don't want
that until

the payment details phase or they want purchasers to be able to do a
guest checkout.

They will point that on amazon.com you don't have to login to put an
item in the cart.


Rate limiting is not effective when they come from different ip ranges.
The old days of using

Rate limiting would make sense for expensive things like search (and
`git blame`), which should also be combined with caching, too,
especially if you aren't even using AI or past purchases/views.

Things like adding an item to a cart, should be a local event for
anonymous users, so, it should be impossible to rate-limit that.

Product listings and categories should 100% be cached, absolutely no
exceptions.  Search pages also absolutely have to be cached, I dunno
who ever though of the brilliant idea that search somehow isn't
cacheable, especially on all those sites where it's 100% deterministic
and identical for all users.

If someone wants to get the entire site of all the products, I don't
see a good reason to preclude that.

In the old days, any vendor would be happy to send you the entire
catalogue of their offerings, all at once, in print form in the US for
major brands, and in Microsoft Excel for the more local vendors, but
now suddenly we want to prevent people from viewing several products
all at a time, or being able to shop the way they want to, or see the
prices for more than a handful of products at a time?!  Misplaced
priorities 100%.

Best regards,
Constantine.


a Class C (/24) as a rate limit key are no longer effective. The bots
come from all over the providers space

(often Azure) but can be from any of the larger providers and often from
different regions.

if you throttle EVERYONE then legit customers can get locked out with
429 or even 503s

And has been pointed out. Relying on the browser string is no longer
effective. They use

common strings and change them dynamically.


Sincerely,

William Kern

PixelGate Networks.
_______________________________________________
NANOG mailing list 
https://lists.nanog.org/archives/list/nanog () lists nanog org/message/KA2KKQUKLYTXC3KR2JHVKZIZSSUGHY2C/


Current thread: