Nmap Development mailing list archives
XML diff output format
From: "Adam Vartanian" <flooey () gmail com>
Date: Mon, 24 Jul 2006 03:22:49 -0400
Below is the output format I currently plan to implement for the new
Nmap XML output diffing tool. I'd appreciate any comments on possible
improvements, especially potential use cases where this format
wouldn't work well.
The format would be based on the existing Nmap XML output format, with
two modifications. First, basically every element and attribute would
become optional. Second, every element would have a new attribute,
tentatively named "diffstatus", defined which can take the values
"add", "delete", "modify", or "none". If absent, a value of "none" is
implied.
Semantically, the diffstatus would indicate the type of action that
would need to be taken regarding this element to produce the second
file from the first file. "Add" means the node (and all of its
children) is missing from the first file and should be added.
"Delete" means the node is missing in the second file and should be
removed. "Modify" means the node's attibutes need to be modified.
"None" means the node does not require any modifications ("none" nodes
may be present because they have children who require modification or
for some other structural reason). Any node that isn't present has an
implied value of "none".
If a node has a value of "modify", only the attributes that are
present would require modification. If an attribute is not mentioned,
its value is the same as in the first file. If an attribute is
removed in the second file, it would be stated as having its default
value (or the empty string, for CDATA attributes with no default).
Determining if two nodes should be considered a modification or a
deletion followed by an addition is based on a key, which isn't
necessarily an attribute of the node. In general, this should be just
what you'd expect: for host nodes, it would be combination of addr and
addrtype in the host's address child; for port nodes, it's the
combination of the protocol and portid attributes; for service nodes,
it's the name attribute. Some nodes (especially children of the host
element) have no key, and are always "modify" nodes. Whenever any
keyed node is to be displayed, it will be displayed with its key, so
that the node can be identified.
There would be several output options provided, each of which is
roughly equivalent, but is optimized in different ways.
An option would be provided to show the values of attributes which
were not modified in "modify" nodes. This would aid human readers,
since seeing something like <osclass diffstatus="modify"
osgen="10.4.X" /> wouldn't give enough context to really know what had
changed. The tradeoff would be increased file size.
A set of options would be provided to either trim down or expand the
nodes that are compared. By default, timing-related nodes (like
taskprogress) and nodes with raw data values (like tcpsequence) would
not be diffed, but other nodes would be. Alternatively, the system
could compare all nodes, or ignore additional sets of nodes.
One drawback that I can see for this method is that it provides no
context for nodes that are added, which means that patching the first
file with the diff file wouldn't necessarily produce the second file,
though it should produce a file that contains the same information.
However, as far as I can see, this should be acceptable. The ordering
of hosts, ports, and so forth "shouldn't" matter, since semantically
it's an unordered list, but I'd be interested if anyone actually does
care about ordering. This might be resolvable by having an option to
display placeholder nodes to give the proper context, either
displaying all existing nodes as empty or displaying the leading and
trailing node with keys.
Another drawback is the handling of non-key attributes which aren't
present in the second file, as stated above. One option would be to
handle non-present attributes by changing that to a node deletion
followed by an addition, but if the node has children that would
potentially result in a lot of extra space in the diff due to a single
removed attribute. Another option would be a new attribute stating
which attributes (if any) had been removed from this node. My
personal preference for solving this would be a command line option
for selecting between the behavior stated above and the delete/add
behavior, or between all three if anyone feels they would have a use
for the third option.
Comments or questions are welcome.
- Adam
_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Current thread:
- XML diff output format Adam Vartanian (Jul 24)
- Message not available
- Re: XML diff output format Julien Delange (Jul 24)
- Message not available
