On Thu, Dec 13, 2012 at 7:52 AM, Philip Whitehouse <philip () whiuk com>
I restate my email's second point.
Google is indexing robots.txt because (from all the examples I can see)
robots.txt doesn't contain a line to disallow indexing of robots.txt
It is possible that some web sites provide actual content in a file that
happens to be called robots.txt (e.g a website concerned with AI
Could Google do better by removing the file? Sure. But as webmasters
told them not to, even though they have provided other files not to
Google is doing exactly what they were asked.
Webmasters don't have to in the US - the Computer Fraud and Abuse Act
(CFAA) means Google (et al) must operate within the authority granted
by the webmasters. If that means the webmasters decide they don't want
their site crawled, then Google (et al) has exceeded its authority and
broken US Federal law. Just ask Weev.
This system needs a submission based whitelist.
Full-Disclosure - We believe in it.
Hosted and sponsored by Secunia - http://secunia.com/