Nmap Development mailing list archives
[NSE] Sketch for XML/HTML parsing API
From: Lauri Kokkonen <lauri.u.kokkonen () gmail com>
Date: Thu, 19 Jan 2012 12:08:53 +0200
Hi,
First off, I am a student inspired by the possible GSOC money opportunity :P
I have come up with a sketch for XML/HTML parsing API. The idea is to have a
method next() that returns the next bit of XML (start tag, attribute name,
etc) from the input string. Along with next() there is state information for
keeping track whether we are inside a tag or between tags (basically).
Then we could build a set of useful methods around the core. For example,
find_start_tag() could find the next occurrence of the given start tag and
parse_attributes() could return a set of attributes given that we are
currently inside a tag. If needed it should be possible to extend the
interface with a SAX-style facility or even add DOM-like features such as
parsing a subtree into a data structure (like it was sketched in another
related thread on this list [1]).
Something like the following would be useful for httpspider.lua:
while x:find_start_tag({"a","img","script"}) do
a = x:parse_attributes()
if a["href"] then ... end
if a["src"] then ... end
end
or maybe:
while x:find_attribute({"href","src"}) do
url = x:next().data
...
end
Following would be useful for http-generator.nse because it will work for
whatever order the attributes are in:
while x:find_start_tag({"meta"}) do
a = x:parse_attributes()
if a["generator"] then ... end
end
One option is to implement this completely in Lua, maybe with the help of
LPeg. Another option is to use a combination of C/C++ and Lua. Is XML
parsing needed elsewhere in Nmap? Looking at a few scripts that parse
XML/HTML files I think that at least libraries like expat and libxml2 are an
overkill for the purpose. For reference, that approach was suggested in
threads [2] and [3].
Lauri
[1] [NSE] XML Parser RFC
http://seclists.org/nmap-dev/2011/q2/1281
http://seclists.org/nmap-dev/2011/q3/25
[2] Add XML support to NSE
http://seclists.org/nmap-dev/2009/q3/1093
[3] [NSE script] web application fingerprinting
http://seclists.org/nmap-dev/2008/q3/462
_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://seclists.org/nmap-dev/
Current thread:
- [NSE] Sketch for XML/HTML parsing API Lauri Kokkonen (Jan 19)
- Re: [NSE] Sketch for XML/HTML parsing API David Fifield (Feb 01)
- Re: [NSE] Sketch for XML/HTML parsing API Lauri Kokkonen (Feb 06)
- Re: [NSE] Sketch for XML/HTML parsing API David Fifield (Feb 01)
