mailing list archives
Re: [NSE] http-feed.nse
From: George Chatzisofroniou <sophron () latthi com>
Date: Sat, 17 Aug 2013 21:21:37 +0300
On Sat, Aug 17, 2013 at 10:14:38AM -0700, David Fifield wrote:
To be clear, the & is not the reason the feed wasn't detected with
the previous version of the script, right? It was because the script
lacked an "application/atom+xml" pattern.
It was both. Even if there was the right pattern, the url won't get parsed
correcly. Try commenting out "l = l:gsub("&", "&")" to see it by yourself.
The right way to get rid of the & is with an HTML parser. Since we
don't have that, I think I would prefer that we not interpret the string
at all in the script. If we handle &, we really should handle <
and > and especially " and ' that are likely to appear in
attribute values. (Granted, & is the most likely and most
problematic of all of these.) But there are also numeric character
entities and the large number of HTML named entities too.
Decoding just & and nothing else creates ambiguities, for example
both of the input strings
map to the same output string
I agree with you that it shouldn't be handled by just this script, in
just this place.
Yes, you are right. I don't like it either. I removed it from the script.
In HTML5 there are some rules about when an ampersand is just an
ampersand and when it is part of a character reference.
It looks like HTML5 rules are different from HTML4. According to HTML4 specs,
encoding an ampersand is always required, even though most of developers ignore
Sent through the dev mailing list
Archived at http://seclists.org/nmap-dev/