mailing list archives
Re: Webtrends HTTP Server %20 bug (UTF-8)
From: "zsn" <zesnark () yahoo com>
Date: Sun, 10 Jun 2001 16:36:53 -0700
It's not at all clear to me a) that UTF-8 sequences are allowed in *any*
HTTP headers (request or response) or b) how a server or client would
MS Internet Explorer has an option to "Always send URLs as UTF-8". The help
text states that this option, "Specifies whether to use UTF-8, a standard
that defines characters so they are readable in any language. This enables
you to exchange Internet addresses (URLs) that contain characters from any
language." It is unclear whether IE sends UTF-8 URLs in requests, when
sending links via e-mail, when saving bookmarks, or in some other case.
2) The UTF-8 rules are kinda funny. 0xFE and 0xFF are illegal everywhere,
and other characters may be illegal depending on their placement, e.g. a
"starting" octet with 2^7 on and 2^6 off, or a "subsequent" octet that
doesn't have 2^7 on and 2^6 off. I wouldn't be surprised if some UTF-8
parsing routines don't handle illegal characters gracefully, or if
applications don't gracefully trap errors reported by the UTF-8 parsing
routines, etc. This might be worth some testing.
I attempted to post a query regarding this a while back but it got rejected.
A very thorough and robust Unicode sanity-checking routine would be highly
useful (and probably such a thing exists; I've never had to deal with this).
Do You Yahoo!?
Get your free @yahoo.com address at http://mail.yahoo.com