|
Politech
mailing list archives
FC: Trapping anti-music piracy spiders: "RIAA Pit of Confusion"
From: Declan McCullagh <declan () well com>
Date: Sat, 17 May 2003 12:10:41 -0400
[One problem with this approach is that it looks like the RIAA is spending
more time targeting FTP sites than web sites. --Declan]
---
From: "Paul \"Evil Genius\" Music" <evlpawl () cox net>
To: "DeClan" <declan () well com>
Date: Sat, 17 May 2003 00:24:49 -0500
http://www.kuro5hin.org/print/2003/5/16/163447/493
RIAA Pit of Confusion (Culture)
By salimfadhley
Fri May 16th, 2003 at 10:13:06 PM EST
After reading about the RIAA threatening to sue yet another innocent
archive operator, I decided to take some direct action: It occurred to me
that the RIAA keep falsely accusing others of piracy because they put their
faith in an unintelligent spider - a fact which can be simply exploited to
make my servers into an RIAA no-go-zone...
Whilst spidering is nothing to worry about (and only to be expected on a
public site), the way the association fires off legal threats based on this
spider results alone seems wrong. Since this spider does not actually look
at the whole title of the file, or even it's content, I figured I could
have some fun at their expense:
What if I could write a `tarpit' script that could create a large number of
interlinked automatically generated web sites. If their spider tried to
scan my server it would be fooled into thinking that it had found a
treasure trove of MP3 sites. Anybody who took the time to look at the site
could see that the site contains no pirate content at all.
How might the RIAA react to such a thing?
They could upgrade their spider so that it only recognises valid tracknames
that are in-fact MP3s. (e.g. it would know that
`elephant_wiggle-Madonna.mp3' is not a real Madonna song). This would limit
their ability to detect only correctly named MP3 files, and force them to
use their spider responsibly.
Every single suspect site would need to be hand-checked in order to verify
that a genuine breach of copyright has taken place - this would
substantially decrease the return on investment for their spidering project
because it would be labour intensive, again forcing a more responsible
approach to detecting offenders.
They could blacklist my server to prevent their spider from looking at it
in future - that would be at least a small victory. If they blacklisted
enough servers it would be the same as giving up!
They could send me a legal nastygram instructing me to disable my tarpit...
Since I do not live in the USA, this might not be enforceable.
How it works
The Pit of Confusion is a pure PHP script that can automatically generate a
very large number of web-sites with links to MP3s. It contains a settings
file which contains lists of famous artist names and random words that can
be used to make silly song titles. There is also a download manager
component - designed to deliver MP3 files in the most inefficient possible way.
As with any web-site, the action starts with a URL. Normally, the first
parts of the URL just signifies the server on which the site runs, however
I have used a Dynamic DNS service to encode the two key site parameters
into the hostname. I learnt that trick from this website. The first two
parts of the domain name tell the script how to build the page: If you visit:
http://madonna.ricky.music.stodge.org
It will show you `Ricky's' Madonna page. The script does not know anything
about Madonna or any of her songs - it just uses information provided at
run-time to set up the basic variables. Anything in the form of
a.b.music.stodge.org will get handled by the same server.
Notice how slowly the page loads - that is because there is a configurable
`annoying delay' built into each transaction. Assuming that the spider
system has a fixed maximum number of threads, it makes sense to tie these
up for as long as possible - but not so long as to deter a person wishing
to verify that there are no pirated files on the site.
Next it builds up a list of randomly named MP3 links that include the the
chosen Artist's name in the title. If you try to click on the link, instead
of delivering a pirated file it sends a non-copyrighted music file via a
download manager that ensures that the download will take a very long time.
The idea is to tie-up as many threads as possible on whatever system is
doing the spidering.
Finally it makes some links to a selection of other random sites produced
by the same system. The idea is to keep the spider in the tarpit for as
long as possible
Notes
This is just my first attempt. No doubt, by now more talented scripters can
see weaknesses in my plan - this is why I intend to share the source-code
of my project with anybody who wants it. If you want to help out, please
leave a message in this board and I will get back to ya!
Full discussion: http://www.kuro5hin.org/story/2003/5/16/163447/493
-------------------------------------------------------------------------
POLITECH -- Declan McCullagh's politics and technology mailing list
You may redistribute this message freely if you include this notice.
-------------------------------------------------------------------------
To subscribe to Politech: http://www.politechbot.com/info/subscribe.html
This message is archived at http://www.politechbot.com/
Declan McCullagh's photographs are at http://www.mccullagh.org/
Like Politech? Make a donation here: http://www.politechbot.com/donate/
-------------------------------------------------------------------------
By Date
By Thread
Current thread:
- FC: Trapping anti-music piracy spiders: "RIAA Pit of Confusion" Declan McCullagh (May 17)
|