Bugtraq mailing list archives
How Google indexed a file with no external link
From: Kevin <kevin () brasscannon net>
Date: Mon, 9 Jul 2001 21:47:44 -0400
I'm running a modest Apache 1.3.19 server on Mandrake 7.2, with a 2.4
kernel. No cgi's or PHP support, though I do have server-info and
server-status enabled for local reference only.
I noticed some hits in the Apache access_log for two files, index.old
and index.older, which were backups of index.html left in my docroot
directory. It wasn't hard to figure out that Google was directing
people to these files; what I couldn't understand was how Google knew
they were there.
Looking a bit deeper, I saw googlebot (and later, some ordinary vistors)
using this syntax:
http://handsonhowto.com/?M=A
http://handsonhowto.com/?S=D
...and if you try this yourself in Internet Explorer, you'll find that
Apache is ignoring my index.html and is giving you a formatted directory
of the docroot directory as though there were no index page.
The differences between the ?M and the ?S versions are not blatantly
obvious, at least not to me.
I'm writing to Bugtraq in frustration because I can't find this documented
ANYWHERE, and it could be a nastier surprise to others than it was to me*.
What other little surprises like this exist, and can I do something in my
Apache config to take control of them?
*Before you tell me about robots.txt, htaccess and so forth, let me
note that I know about those; and before I put this site up I realized
that anything I leave in my docroot is fair game. I'm only puzzled
because I can't find ANY information about these /?M or /?S thingamabobs.
I can't even RTFM, because I don't know what to call them!
P.S. I have since added .old, .older, .oldest to the list of file types
to be served as html, and created new versions of all three files that
redirect visitors to index.html instead.
Sanitized Apache httpd.conf appended at moderator's request -- standard
Apache comments stripped out to reduce the size.
8<------ snip here ----------
ServerType standalone
ServerRoot "/usr/local/apache"
PidFile /var/log/httpd.pid
ScoreBoardFile /var/log/httpd.scoreboard
Timeout 300
KeepAlive On
MaxKeepAliveRequests 100
KeepAliveTimeout 15
MinSpareServers 2
MaxSpareServers 4
StartServers 3
MaxClients 50
MaxRequestsPerChild 0
ExtendedStatus On
Port 80
User webby
Group webby
ServerAdmin kevin () brasscannon com
ServerName howie.brasscannon.com
DocumentRoot "/home/http"
<Directory />
Options FollowSymLinks
AllowOverride None
</Directory>
<Directory /home/http/bcc/images>
Order Deny,Allow
Deny from All
AllowOverride AuthConfig
</Directory>
<Directory "/home/http">
Options Indexes FollowSymLinks MultiViews
AllowOverride None
Order allow,deny
Allow from all
</Directory>
<IfModule mod_userdir.c>
UserDir public_html
</IfModule>
<IfModule mod_dir.c>
DirectoryIndex index.html
</IfModule>
AccessFileName .htaccess
<Files ~ "^\.ht">
Order allow,deny
Deny from all
</Files>
UseCanonicalName On
<IfModule mod_mime.c>
TypesConfig /usr/local/apache/conf/mime.types
</IfModule>
DefaultType text/plain
<IfModule mod_mime_magic.c>
MIMEMagicFile /usr/local/apache/conf/magic
</IfModule>
HostnameLookups Off
ErrorLog /var/log/error_log
LogLevel warn
LogFormat "%h %l %u %t %v \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" custom
LogFormat "%{Referer}i -> %U" referer
LogFormat "%{User-agent}i" agent
CustomLog /var/log/access_log custom
ServerSignature Off
<IfModule mod_alias.c>
Alias /icons/ "/usr/local/apache/icons/"
<Directory "/usr/local/apache/icons">
Options Indexes MultiViews
AllowOverride None
Order allow,deny
Allow from all
</Directory>
ScriptAlias /cgi-bin/ "/usr/local/apache/cgi-bin/"
<Directory "/usr/local/apache/cgi-bin">
AllowOverride None
Options None
Order allow,deny
Allow from all
</Directory>
</IfModule>
<IfModule mod_autoindex.c>
IndexOptions FancyIndexing
# Bunch of defaults provided by Apache - snipped
ReadmeName README
HeaderName HEADER
IndexIgnore .??* *~ *# HEADER* README* RCS CVS *,v *,t
</IfModule>
<IfModule mod_mime.c>
AddEncoding x-compress Z
AddEncoding x-gzip gz tgz
# Bunch of defaults provided by Apache - snipped
<IfModule mod_negotiation.c>
LanguagePriority en da nl et fr de el it ja kr no pl pt pt-br ru ltz ca es sv tw
</IfModule>
AddType application/x-tar .tgz
# Added by me AFTER seeing hits for these extensions:
AddType text/html .old .older .oldest
# This was NOT enabled:
#AddHandler send-as-is asis
</IfModule>
<IfModule mod_setenvif.c>
BrowserMatch "Mozilla/2" nokeepalive
BrowserMatch "MSIE 4\.0b2;" nokeepalive downgrade-1.0 force-response-1.0
BrowserMatch "RealPlayer 4\.0" force-response-1.0
BrowserMatch "Java/1\.0" force-response-1.0
BrowserMatch "JDK/1\.0" force-response-1.0
</IfModule>
<Location /server-status>
SetHandler server-status
Order deny,allow
Deny from all
Allow from 172.16.0.1
</Location>
<Location /server-info>
SetHandler server-info
Order deny,allow
Deny from all
Allow from 172.16.0.1
</Location>
NameVirtualHost 172.16.0.1
<VirtualHost 172.16.0.1>
DocumentRoot "/home/http"
</VirtualHost>
<VirtualHost 172.16.0.1>
ServerName brasscannon.com
DocumentRoot "/home/http/bcc/com"
</VirtualHost>
<VirtualHost 172.16.0.1>
ServerName www.brasscannon.com
DocumentRoot "/home/http/bcc/com"
</VirtualHost>
<VirtualHost 172.16.0.1>
ServerName images.brasscannon.org
DocumentRoot "/home/http/bcc/images"
</VirtualHost>
<VirtualHost 172.16.0.1>
ServerName brasscannon.org
DocumentRoot "/home/http/bcc/org"
</VirtualHost>
<VirtualHost 172.16.0.1>
ServerName www.brasscannon.net
DocumentRoot "/home/http/bcc/com"
</VirtualHost>
<VirtualHost 172.16.0.1>
ServerName brasscannon.net
DocumentRoot "/home/http/bcc/com"
</VirtualHost>
# EOF EOF EOF
8<------ snip here ----------
Current thread:
- How Google indexed a file with no external link Kevin (Jul 10)
- Re: How Google indexed a file with no external link W. Craig Trader (Jul 10)
- Re: How Google indexed a file with no external link Theo Van Dinter (Jul 10)
- Message not available
- Re: How Google indexed a file with no external link Kevin (Jul 10)
