mailing list archives
Re: [NSE] http-feed.nse
From: David Fifield <david () bamsoftware com>
Date: Sat, 17 Aug 2013 10:14:38 -0700
On Sat, Aug 17, 2013 at 06:16:57PM +0300, George Chatzisofroniou wrote:
On Fri, Aug 16, 2013 at 08:38:21PM -0700, David Fifield wrote:
It didn't find feeds at secwiki.org. It doesn't seem to understand this
markup on the home page.
<link rel="alternate" type="application/atom+xml" title="SecWiki Atom feed"
What is special with this reference is that it contains an encoded ampersand.
For now, i updated the script to decode "&" to "&", but this is something
that should be done in httpspider and http libraries.
To be clear, the & is not the reason the feed wasn't detected with
the previous version of the script, right? It was because the script
lacked an "application/atom+xml" pattern.
The right way to get rid of the & is with an HTML parser. Since we
don't have that, I think I would prefer that we not interpret the string
at all in the script. If we handle &, we really should handle <
and > and especially " and ' that are likely to appear in
attribute values. (Granted, & is the most likely and most
problematic of all of these.) But there are also numeric character
entities and the large number of HTML named entities too.
Decoding just & and nothing else creates ambiguities, for example
both of the input strings
map to the same output string
I agree with you that it shouldn't be handled by just this script, in
just this place.
In HTML5 there are some rules about when an ampersand is just an
ampersand and when it is part of a character reference.
Sent through the dev mailing list
Archived at http://seclists.org/nmap-dev/