mailing list archives
[NSE] Unicode library
From: Daniel Miller <bonsaiviking () gmail com>
Date: Tue, 11 Mar 2014 06:38:30 -0500
Hello again, devs!
On the 19th of February, I committed another NSE library, unicode.lua
(http://nmap.org/nsedoc/lib/unicode.html). It is intended for
general-purpose lightweight encoding/decoding/transcoding of Unicode
and other character encodings. The original purpose was to replace all
the trivial null-filling and -skipping that scripts were using to
"decode" Windows Unicode strings (UTF-16 LE).
As a result of this support, our SMB scripts should be able to
preserve internationalized Windows share names, user names, etc. as
well as authenticate with non-ASCII passwords. Displaying them to the
user is a separate problem, since the conversion from UTF-16 to UTF-8
will remove the nulls, but will result in output like this:
"Vi\xe1\xbb\x87t Nam" instead of "Việt Nam." In light of that, future
improvements could be:
* Console/terminal encoding detection for Nmap generally, with full
UTF-8 support throughout. ICANN's new Unicode TLDs may prove difficult
for Nmap to scan otherwise.
* Better error checking and recovery for decoding errors. Currently
errors result in a failure to decode, but the library also accepts
many things that are incorrect without warning.
* Converting scripts that currently negotiate Windows OEM strings to
negotiate Unicode, since OEM code pages vary and cannot be negotiated.
* Normalization. This is unlikely to be complete, since Unicode
normalization is an enormous topic. Much better to find a good C
library that does this and incorporate it instead.
"The Absolute Minimum Every Software Developer Absolutely, Positively
Must Know About Unicode and Character Sets (No Excuses!)" -
"Unicode Security Guide" - http://websec.github.io/unicode-security-guide/
Sent through the dev mailing list
Archived at http://seclists.org/nmap-dev/
- [NSE] Unicode library Daniel Miller (Mar 11)