Home page logo

nmap-dev logo Nmap Development mailing list archives

[NSE] http-archive.nse
From: George Chatzisofroniou <sophron () latthi com>
Date: Mon, 23 Sep 2013 11:25:58 +0300

The attached script crawls through the previous versions of the target
website (it is getting them from archive.org) and extracts links from
them. It then checks if these links exist today and outputs the results.

It is useful for discovering hidden pages that were used in the past but
still exist on the target. It also gives an overview of the target
website through time.

Unfortunately, the script is not working properly right now, because the
archive.org app is always changing its contents, so the patterns need to
be updated. I post it today, because it is the last day of GSoC.
Consider this mail as a check point. I will make it stable in the next

A part of the output against insecure.org, when it was working properly
was like that:

 | web.archive.org/web/19981205075750/http://www.insecure.org/
 |     Alive links:
 |       insecure.org/myworld.html
 |       insecure.org/reading.html
 |       insecure.org/sploits.html
 |       insecure.org/credits.html
 | web.archive.org/web/19990125100235/http://www.insecure.org/
 |     Alive links:
 |       insecure.org/nmap/index.html
 | web.archive.org/web/20000301165730/http://www.insecure.org/
 |     Alive links:
 |       insecure.org/sploits_solaris.html
 |       insecure.org/nmap/nmap-fingerprinting-article.html
 |       insecure.org/nmap/index.html#download
 | web.archive.org/web/20020124070013/http://www.insecure.org/
 |     Alive links:
 |       insecure.org/sploits_linux.html
 |       insecure.org/nmap/

Note that this script is pretty intrusive for both archive.org and the
target website, that's why there are maxyears and singleyears options to
limit the crawling operations.

Me and Patrick think that we can split the logic of this script into at
least three smaller scripts.

* http-archive that brings all archives in an interval.
* http-archive-liveness that brings 'alive' and 'dead' links from archives.
* http-archive-hidden that brings 'hidden' links from archives.

George Chatzisofroniou

Attachment: http-archive.nse

Sent through the dev mailing list
Archived at http://seclists.org/nmap-dev/

  By Date           By Thread  

Current thread:
  • [NSE] http-archive.nse George Chatzisofroniou (Sep 23)
[ Nmap | Sec Tools | Mailing Lists | Site News | About/Contact | Advertising | Privacy ]