Security Basics mailing list archives
Re: Web bots - blocking the bad
From: Ben Carr <carr.287 () osu edu>
Date: Tue, 19 Aug 2003 13:34:15 -0400
At 05:30 AM 8/19/2003 -0400, Kelley Jr, Robert wrote:
I'd like to know your thoughts on spending time implementing measures to block the bad robots from scanning and indexing the web-site.
The place to start is your robots.txt file. This file does/should reside in the root directory of your site and contains rules that robots are to follow.
It follows a rather simple format, and the most basic might look like this: robots.txt --------------- User-agent: googlebot Disallow: /SomeDir User-agent: * Disallow: / ----------------This would allow google to scan/index everything on your site except SomeDir, and prevent all other robots from crawling the site.
You can find more information and better examples at: http://www.robotstxt.org/ -Ben Carr --------------------------------------------------------------------------- ----------------------------------------------------------------------------
Current thread:
- Web bots - blocking the bad Kelley Jr, Robert (Aug 19)
- RE: Web bots - blocking the bad Chad (Aug 19)
- RE: Web bots - blocking the bad Horace Pinker (Aug 19)
- <Possible follow-ups>
- Re: Web bots - blocking the bad Ben Carr (Aug 19)
- RE: Web bots - blocking the bad Chad (Aug 19)
- traceroute-like tool for UDP or TCP packets Kent James (Aug 20)
- Re: traceroute-like tool for UDP or TCP packets Sven Pfeifer (Aug 21)
- Re: traceroute-like tool for UDP or TCP packets shawnmer (Aug 21)
- RE: Web bots - blocking the bad Chad (Aug 19)
- RE: Web bots - blocking the bad Chris Santerre (Aug 19)
