WebApp Sec mailing list archives

Re: Hit Throttling - Content Theft Prevention


From: Nik Cubrilovic <cubrilovic () gmail com>
Date: Wed, 19 Oct 2005 17:03:36 +1000

On 19/10/05, Kurt Seifried <bt () seifried org> wrote:
One effective strategy is to have hidden links (i.e. white text on white
background or a 1x1 pixel image stashed somewhere) that regular browsers
won't see at all. Have it go to a page with more links that specifically say
"do not click this, you will be blocked," etc. These links go to a CGI, the
CGI blocks that IP/etc (firewall rules, apache config, whatever), make sure
you stick these in various alphabetical orders and at the top and bottom of
the pages (many scrappers start at the top of a page or go in alphabetical
order).

Thanks for the tips Kurt. In one instance, I have seen a crawler that
will iterate ID's within a GET request, ie.

/content.asp?bookid=1000&page=1
/content.asp?bookid=1000&page=2

and keep going until it reaches 404, and then increment the first ID.
In one instance, we obfuscated the ID's by hashing them (md5), but the
bot caught on and then did the same thing. We had to put that case
down to a design fault in the application in that the URL's were
easily guessable.  When you have content of high value at stake, the
'other side' seems to get more sophisticated as opposed to your
standard home user who has downloaded a website scraper from
download.com. What your tips are leading towards are ways to
distinguish human visitors from bots, which with some attackers simply
leads to a game of cat-and-mouse as opposed to a solution that can be
handed to the client. We have also tried CAPTCHA, but that resulted in
a noticeable drop in website hits with 20-30% of visitors not going
past the image challenge screen. The CAPTCHA lead to a one-time URL,
which also posed a problem when the user would refresh the page.

I have contacted a number of appliance vendors to see if they offer a
transparent application-layer firewall that could identify bad bots
and drop them, but surprisingly not one had a solution to offer. This
is a field that we are continuously getting more and more requests
about - pity that the Big Co's aren't taking up the opportunity
(considering that most of the companies being affected and who need to
protect their content would pay almost anything).

Regards,
Nik

--
Nik Cubrilovic  -  <http://www.nik.com.au>


Current thread: