mailing list archives
Re: CGI.pm and the untrusted-URL problem
From: lstein () CSHL ORG (Lincoln Stein)
Date: Tue, 15 Feb 2000 12:12:00 -0500
The important point is that anything coming from the outside -- the
URL, the SERVER_PROTOCOL, the request body, the request MIME type --
should be treated as untrusted data. If you turn on taint checking,
Perl will refuse to take "dangerous actions" with untrusted data or
any data that has touched untrusted data. Modifying CGI.pm to be more
strict with the URL has the unwanted consequence of breaking people's
scripts and generating lots of support messages to me, without making
the CGI script any safer.
Even if you have a Web browser that is "good", people can still telnet
to the web server and make illegal requests.
Kragen Sitaker writes:
support () dnaco net removed from addressee list because they probably
don't want to hear the whole conversation; I just want them to fix our
local CGI.pm so my web pages are safe. :)
Marc Slemko writes:
On Mon, 14 Feb 2000, Kragen Sitaker wrote:
It appears that this happens because the unencoded space is interpreted
by the HTTP server (Apache 1.3.6 in my tests) as separating the URL
from the protocol name. So the environment variable SERVER_PROTOCOL
gets set to everything following the space, followed by a space and the
actual protocol, such as "HTTP/1.0".
Correct, this does appear to be a bug. I suspect that a lot of such bugs
will be found. Unfortunately.
However it is important to note that this does not exploit a bug in
Apache. Apache is choosing to deal with an illegal request in a perfectly
legitimate manner. At least, that is my understanding of what the spec
says; I haven't checked it closely WRT this particular issue.
I think you're right.
Part of Apache's functionality is to pass unknown methods and protocols on
to CGIs. It is be arguable that Apache should explicitly reject any
request with more than two unencoded spaces in it.
Well, unknown methods I certainly agree with; but if the protocol is
completely unknown --- not even a version of HTTP --- how can Apache
reasonably think it knows what part of the request constitutes the URL,
or when it has reached the end of the request?
Apache, in this case, constitutes the interface between mutually
untrusted contexts: a Web browser and a CGI script. (And, as CERT
points out, there's a third context involved, trusted by neither of the
other two --- the URL provider.) As I see it, part of its purpose in
life is to restrict the information passed between these contexts to a
known and unsurprising set of channels.
RFC 1738 and RFC 2068 say that only a-z, 0-9, "+", ".",
and "-" are allowed in scheme names. Accordingly, I suggest the
following change to CGI.pm:
Or it could simple properly encode things, as it should do for all data
supplied by the user that is output.
Filtering is often easier, however, as encoding can be very context
I'm not sure what the proper encoding for scheme names would be. :)
self_url does appear to properly encode malicious data inserted in
other parts of the input URL.
The successful exploit requires a remarkable chain of extreme forgiveness:
1- The web browser must accept an illegal URL from (possibly valid,
although very unusual) HTML.
2- The web browser must send an illegal HTTP request with the illegal
URL, without %-encoding the URL to make it legal.
Note that IE appears to be far better in making sure it only makes legal
requests. Good job Microsoft, in this particular situation.
What version of IE is better in this way? MSIE 3.0 is just as lenient
as Netscape 4.6 in this sitation. I don't have any machines with MSIE
4 installed, because MSIE 4 makes me uncomfortable.
<kragen () pobox com> Kragen Sitaker <http://www.pobox.com/~kragen/>
The Internet stock bubble didn't burst on 1999-11-08. Hurrah!
The power didn't go out on 2000-01-01 either. :)
Lincoln D. Stein Cold Spring Harbor Laboratory
lstein () cshl org Cold Spring Harbor, NY