|
Full Disclosure
mailing list archives
Re: Google's robot.txt handling
From: Thomas Behrend <webmaster () lord-pinhead com>
Date: Wed, 12 Dec 2012 07:20:14 +0100
We found this "Security Issue" real long time ago and used it by
ourself to find hidden pages.
The only thing you could do, is to harden the directory for Crawlers
with Mod_Rewrite or in the index.(php|pl|py|asp|etc) itself when you
check the Browser String. If it doesn´t contain somethin like the common
Browser Strings, just send an 404 back and Google and other Crawlers
will never index it.
Of course, you just could rename /admin/ to /4dm1n/ or even just us an
Subdomain you never link on your Webpage, in that case, just split the
Webcontent and hide / in the robots.txt just in case the URL leaks.
Another thing we saw working: Just lock the directory via htaccess of
your Webserversoftware and Google didn´t index the page because the
Crawler didn´t get an HTTP Code 200 back, its getting an 401.
So, thats our way to "hide" our Admininterfaces. Worked so far, but
even in case someone finds it, the Interface should be strong enough to
withstand any Attack. And of course, the Login Creditials shouldn´t be
"password" or on top of Page "Speak friend an come in" :)
So long
Thomas
On Tue, 11 Dec 2012 17:57:31 -0500, Jeffrey Walton <noloader () gmail com>
wrote:
On Tue, Dec 11, 2012 at 5:53 PM, Christian Sciberras
<uuf6429 () gmail com> wrote:
If you ask me, it's a stupid idea. :)
I prefer to know where I am with a service; and (IMHO) I would
prefer to
query (occasionally) Google for my CC instead of waiting for someone
to
start taking funds off it.
Hiding it only provides a false sense of security - it will last
until
someone finds the service leaking out CCs.
Agreed. How about search engine data by other crawlers that was not
sanitized?
This is especially the case with robots.txt. Can someone on the list
please
define a "good web crawler"?
Haha! Milk up the nose.
I think the problem here is that people are plain stupid and throw
in direct
entries inside robots.txt, whereas they should be sending wildcard
entries.
Couple that with actually protecting sensitive areas, and it's a
pretty good
defence.
We now know you don't need a robots.txt for exclusion. Just ask Weev.
Jeff
On Tue, Dec 11, 2012 at 10:38 PM, Jeffrey Walton
<noloader () gmail com> wrote:
On Tue, Dec 11, 2012 at 4:11 PM, Mario Vilas <mvilas () gmail com>
wrote:
I think we can all agree this is not a vulnerability. Still, I
have yet
to
see an argument saying why what the OP is proposing is a bad
idea. It
may be
a good idea to stop indexing robots.txt to mitigate the faults of
lazy
or
incompetent admins (Google already does this for many specific
search
queries) and there's not much point in indexing the robots.txt
file for
legitimate uses anyway.
I kind of agree here. The information is valuable for the
reconnaissance phase of an attack, buts its not a vulnerability per
se. But what is to stop the attacker from fetching it
himself/herself
since its at a known location for all sites? In this case, Google
would be removing aggregated search results (which means the
attacker
would have to compile it himself/herself).
Google removed other interesting searches, such as social security
numbers and credit card numbers (or does not provide them to the
general public).
Jeff
On Tue, Dec 11, 2012 at 2:01 PM, Scott Ferguson
<scott.ferguson.it.consulting () gmail com> wrote:
If I understand the OP correctly, he is not stating that
listing
something
in robots.txt would make it inaccessible, but rather that
Google
indexes
the robots.txt files themselves,
<snipped>
Well, um, yeah - I got that.
So you are what, proposing that moving an open door back a few
centimetres solves the (non) problem?
Take your proposal to it's logical extension and stop all search
engines
(especially the ones that don't respect robots.txt) from
indexing
robots.txt. Now what do you do about Nutch or even some perl
script
that
anyone can whip up in 2 minutes?
Security through obscurity is fine when couple with actual
security -
but relying on it alone is just daft.
Expecting to world to change so bad habits have no consequence
is
dangerously naive.
I suspect you're looking to hard at finding fault with Google -
who are
complying with the robots.txt. Read the spec. - it's about not
following
the listed directories, not about not listing the robots.txt.
Next
you'll want laws against bad weather and furniture with sharp
corners.
Don't put things you don't want seen to see in places that can
be seen.
On Mon, Dec 10, 2012 at 8:19 PM, Scott Ferguson <
scott.ferguson.it.consulting () gmail com> wrote:
/From/: Hurgel Bumpf <l0rd_lunatic () yahoo com>
/Date/: Mon, 10 Dec 2012 19:25:39 +0000 (GMT)
------------------------------------------------------------------------
Hi list,
i tried to contact google, but as they didn't answer my
email, i
do
forward this to FD.
This "security" feature is not cleary a google
vulnerability, but
exposes websites informations that are not really
intended to be public.
Conan the bavarian
Your point eludes me - Google is indexing something which is
publicly
available. eg.:- curl http://somesite.tld/robots.txt
So it seems the solution to the "question" your raise is, um,
nonsensical.
If you don't want something exposed on your web server *don't
publish
references to it*.
The solution, which should be blindingly obvious, is don't
create
the
problem in the first place. Password sensitive directories
(htpasswd)
-
then they don't have to be excluded from search engines
(because
listing
the inaccessible in robots.txt is redundant). You must of
missed the
first day of web school.
_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/
_______________________________________________
Full-Disclosure - We believe in it.
Charter: http://lists.grok.org.uk/full-disclosure-charter.html
Hosted and sponsored by Secunia - http://secunia.com/
By Date
By Thread
Current thread:
|