Home page logo

fulldisclosure logo Full Disclosure mailing list archives

Re: Google's robots.txt handling
From: Gynvael Coldwind <gynvael () coldwind pl>
Date: Tue, 11 Dec 2012 00:33:40 +0100


Here is an example:

An admin has a public webservice running with folders containing
sensitive informations. Enter these folders in his robots.txt and
"protect" them from the indexing process of spiders. As he doesn't
want the /admin/ gui to appear in the search results he also puts his
/admin in the robots text and finaly makes a backup to the folder

If no one would know about a folder, why would one add it to
robots.txt in the first place?
But that's missing the point anyway - robots.txt is not a security mechanism.
If someone uses robots.txt as the only and last line of defense he
plainly doesn't understand what he's doing (especially that it's one
of the first files both pentesters & attackers look at).

If someone has an /admin/ site (which is a really easily guessable
name, checked by every web directory scanner out there) he cannot rely
on concealment*, but on proper user authentication using mechanisms
designed for such purpose (e.g. requiring a password).

(* for historical reasons there is a Polish IT term for such attempts
- "deep hiding", there's even a wiki page on that -

I'm wondering if, in perhaps .htaccess, one could allow ONLY site
crawlers access to the robots.txt file.  Then add robots.txt to
robots.txt...would this mitigate some of the risk?

1. It's still missing the point.
2. No, it wouldn't work in case of scanners that try to impersonate robots.

Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/

  By Date           By Thread  

Current thread:
[ Nmap | Sec Tools | Mailing Lists | Site News | About/Contact | Advertising | Privacy ]