CGI.pm and the untrusted-URL problem
From: kragen () POBOX COM (Kragen Sitaker)
Date: Mon, 14 Feb 2000 14:01:48 -0500

Description of the Problem

CGI.pm contains a method self_url which returns the URL with which the
script was called, including all of the data fields submitted ---
except for the .submit= field added by CGI.pm.

Normally, this is used something like this:

        my $self = self_url;
        print qq(<a href="$self#Section2">Section 2</a>\n);

If CGI.pm is running on Apache 1.3.6, probably other versions of
Apache, and possibly other Web servers, it is possible for a client to
cause self_url to include arbitrary sequences of characters at its
beginning, such as

        "><script language="JavaScript">evil_code()</script><a href="

which, if used in the manner described above, leads to the problem
described in CERT Advisory CA-2000-02, "Malicious HTML Tags Embedded in
Client Web Requests".

Apparently, anything following an unencoded space in the URL used to
invoke the script ends up being inserted, unencoded but converted to
lower case, at the beginning of self_url's return value.

Unencoded spaces are, of course, illegal in URLs.  Most web browsers
accept them anyway in HREF attributes, and don't bother to %-encode
them when they send them in a GET request.

Netscape 4.6, MSIE 3.0, Mozilla M12, and Lynx 2.8.1rel.2 at least,
allow HREF attribute values to be delimited by ' single-quotes instead
of " double-quotes, which allows insertion of unencoded " double-quotes
into the URL --- which is crucial to exploiting this problem.  Lynx
2.8.1rel.2, however, strips the spaces from the URL found in HTML,
preventing it from being exploited via <A HREF=''>.


It appears that this happens because the unencoded space is interpreted
by the HTTP server (Apache 1.3.6 in my tests) as separating the URL
from the protocol name.  So the environment variable SERVER_PROTOCOL
gets set to everything following the space, followed by a space and the
actual protocol, such as "HTTP/1.0".

Three of the four tested browsers (Netscape 4.6, MSIE 3.0, and Mozilla
M12) send the unencoded space in the request URL, which generates an
illegal HTTP Request-Line.

CGI.pm simply takes that environment variable, chops off everything
from the slash onwards, lowercases it, and returns the result as the
URL scheme.

Suggested fixes

RFC 1738 and RFC 2068 say that only a-z, 0-9, "+", ".",
and "-" are allowed in scheme names.  Accordingly, I suggest the
following change to CGI.pm:

*** /usr/local/lib/perl5/5.00503/CGI.pm Tue May 18 00:04:20 1999
--- /home/kragen/lib/perl5/site_perl/5.005//CGI.pm      Mon Feb 14 12:07:37 2000
*** 2594,2600 ****
      return 'https' if $self->server_port == 443;
      my $prot = $self->server_protocol;
      my($protocol,$version) = split('/',$prot);
!     return "\L$protocol\E";

--- 2594,2602 ----
      return 'https' if $self->server_port == 443;
      my $prot = $self->server_protocol;
      my($protocol,$version) = split('/',$prot);
!     $protocol = lc $protocol;
!     $protocol =~ tr/-+.a-z0-9//cd;
!     return $protocol;

(Sorry --- I'm using Solaris diff, which doesn't have unified diff

This prevents the exploit, but of course the resulting URL is
incorrect.  It won't affect responses to well-formed HTTP requests,
which should never have anything other than HTTP for the $protocol to
begin with.

It might be smarter to always return 'http' when not returning 'https';
I'm not presently aware of any protocols other than HTTP and SSL HTTP used with
CGI.  The current draft CGI spec says:

        Note that the scheme and the protocol are not identical; for
        instance, a resource accessed via an SSL mechanism may have a
        Client-URI with a scheme of "https" rather than "http".
        CGI/1.1 provides no means for the script to reconstruct this,
        and therefore the Script-URI includes the base protocol used.

. . . in other words, implementing self_url in a way that is guaranteed
to be correct for future non-HTTP CGI implementations is not possible.

The successful exploit requires a remarkable chain of extreme forgiveness:
1- The web browser must accept an illegal URL from (possibly valid,
   although very unusual) HTML.
2- The web browser must send an illegal HTTP request with the illegal
   URL, without %-encoding the URL to make it legal.
3- The HTTP server must accept the illegal HTTP request.
4- The HTTP server must invoke the CGI script with a nonsensical
5- The CGI script must accept the nonsensical SERVER_PROTOCOL and use it to
   produce an illegal URL, which it must then embed in HTML it outputs.
6- The web browser must then trust the output of the CGI script in some
   fashion inappropriate to the supplier of the original URL.

Netscape 4.6, MSIE 3.0, and Mozilla M12 (and, I would guess, most Web
browsers) will happily perform steps 1 and 2; Apache 1.3.6 (and, I
would guess, most Web servers) will happily perform steps 3 and 4; any
program using CGI.pm and embedding self_url's return value in their
outputs will perform step 5; and as CERT advisory CA-2000-02 documents,
there are a wide variety of situations that can cause step 6 to

My patch above breaks the chain at step 5.  It would be nice to break
it at other steps as well.

The HTTP requests used in this exploit are broken --- i.e. by having a
Request-Line that has a protocol name that not only fails to be "HTTP",
but actually fails to be a valid protocol name at all.  Perhaps Apache
and other web servers should respond to such egregious protocol
violations with error messages, rather than passing the bogus data on
to CGI scripts.

I have not sent copies of this mail to other web-server teams, because
I do not have the facilities or inclination to properly verify that
they are equally lenient.  Preliminary testing suggests that they are

- IIS 5.0 responds, "The parameter is incorrect".
- Netscape-Enterprise/3.6 responds, "Your browser sent a
  message this server could not understand."
- Zeus 3.3 responds with a 400 Bad Request error.
- thttpd 2.15 responds with a 400 Bad Request error.

I also believe that Web browsers should take some steps to avoid
sending illegal HTTP requests; since the problem here happens only when
both the server and browser are trusted --- perhaps due to some earlier
authentication exchange between them --- while the URL is untrusted,
the browser should validate the URL, at least to the point of not
sending illegal requests to the server.


http://www.w3.org/CGI/ --- information about CGI
http://Web.Golux.Com/coar/cgi/draft-coar-cgi-v11-03-clean.html --- current
        draft specification for CGI
http://www.cert.org/advisories/CA-2000-02.html --- CERT advisory CA-2000-02,
        "Malicious HTML Tags Embedded in Client Web Requests"
RFC 1738, http://info.internet.isi.edu:80/in-notes/rfc/files/rfc1738.txt ---
        "Uniform Resource Locators (URL)" --- in particular, section 2.1,
        which defines the syntax of scheme names
RFC 2068, http://info.internet.isi.edu:80/in-notes/rfc/files/rfc2068.txt ---
        "Hypertext Transfer Protocol -- HTTP/1.1"
        --- in particular, section 3.2.1, which defines the syntax of
        URI scheme names identically to RFC 1738, but including
        uppercase US-ASCII letters.
        --- and section 5.1, which defines the syntax of HTTP Request-Lines,
        indicating (together with the sections defining URI syntax and
        section 33.1, defining HTTP-Version syntax) that they must
        contain exactly two spaces.
http://stein.cshl.org/WWW/CGI/ --- documentation for CGI.pm
http://www.apache.org/info/css-security/apache_specific.html --- changes made
        to Apache in response to CA-2000-02
http://www.netcraft.co.uk/survey/ --- Netcraft Web Server Survey,
        which lists the most popular web server software

<kragen () pobox com>       Kragen Sitaker     <http://www.pobox.com/~kragen/>
The Internet stock bubble didn't burst on 1999-11-08.  Hurrah!
The power didn't go out on 2000-01-01 either.  :)

