mailing list archives
Re: Google's robots.txt handling
From: Gynvael Coldwind <gynvael () coldwind pl>
Date: Tue, 11 Dec 2012 00:33:40 +0100
Here is an example:
An admin has a public webservice running with folders containing
sensitive informations. Enter these folders in his robots.txt and
"protect" them from the indexing process of spiders. As he doesn't
want the /admin/ gui to appear in the search results he also puts his
/admin in the robots text and finaly makes a backup to the folder
If no one would know about a folder, why would one add it to
robots.txt in the first place?
But that's missing the point anyway - robots.txt is not a security mechanism.
If someone uses robots.txt as the only and last line of defense he
plainly doesn't understand what he's doing (especially that it's one
of the first files both pentesters & attackers look at).
If someone has an /admin/ site (which is a really easily guessable
name, checked by every web directory scanner out there) he cannot rely
on concealment*, but on proper user authentication using mechanisms
designed for such purpose (e.g. requiring a password).
(* for historical reasons there is a Polish IT term for such attempts
- "deep hiding", there's even a wiki page on that -
I'm wondering if, in perhaps .htaccess, one could allow ONLY site
crawlers access to the robots.txt file. Then add robots.txt to
robots.txt...would this mitigate some of the risk?
1. It's still missing the point.
2. No, it wouldn't work in case of scanners that try to impersonate robots.
Full-Disclosure - We believe in it.
Hosted and sponsored by Secunia - http://secunia.com/