Security Basics mailing list archives

Re: Web bots - blocking the bad


From: Ben Carr <carr.287 () osu edu>
Date: Tue, 19 Aug 2003 13:34:15 -0400

At 05:30 AM 8/19/2003 -0400, Kelley Jr, Robert wrote:
I'd like to know your thoughts on spending time implementing measures to block the bad robots from scanning and indexing the web-site.

The place to start is your robots.txt file. This file does/should reside in the root directory of your site and contains rules that robots are to follow.

It follows a rather simple format, and the most basic might look like this:
robots.txt
---------------
User-agent: googlebot
Disallow: /SomeDir

User-agent: *
Disallow: /
----------------
This would allow google to scan/index everything on your site except SomeDir, and prevent all other robots from crawling the site.

You can find more information and better examples at:
http://www.robotstxt.org/

-Ben Carr




---------------------------------------------------------------------------
----------------------------------------------------------------------------


Current thread: