Home page logo

nmap-dev logo Nmap Development mailing list archives

Re: [RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans)
From: doug () hcsw org
Date: Fri, 3 Apr 2009 01:57:17 +0000

Hi Brandon,

On Thu, Apr 02, 2009 at 11:09:57PM +0000 or thereabouts, Brandon Enright wrote:
In scanning the thousands of services on our network I regularly run
into the following error:

Warning: Hit PCRE_ERROR_MATCHLIMIT when probing for service http with the regex '^HTTP/1\.0 \d\d\d 
.*\r\n.*\r\n\r\n.*\t<title>Strongdc\+\+ webserver - Login Page</title>\t'

There are a number match lines that trigger this, here are a couple more

Warning: Hit PCRE_ERROR_MATCHLIMIT when probing for service http with the regex '^HTTP/1\.0 \d\d\d .*\n.*Server: 
ADSM_HTTP/([\d.]+)\nContent-type: text/html\n\n<HEAD>\n<TITLE>\nServer Administration\n</TITLE>\n\n<META 
NAME=\"IBMproduct\" CONTENT=\"ADSM\">\n<META NAME=\"IBMproductVersion\" CONTENT=\"([\d.]+)\">.*Storage Management 
Server for AIX'

Ya, I've seen this warning before too although I can't remember the
service or match line that triggered it. We should definitely
avoid backtracking as much as possible.

I always try to make sure that there are no such segments
between .* groups if the s modifier is in use. While avoiding
the s modifier will usually allow matches to fail early in some
cases, I think the s modifier is very important in creating
robust match lines for some protocols like HTTP. Here is a typical
HTTP s modifier match:

match http m|^HTTP/1\.0 200 .*\r\nServer: Allegro-Software-RomPager/([\w-_.]+)\r\n.*<TITLE>SONY NSP-100 Main 
              \------------/  \--------------------------------------------------/  
                    1                                    2                                             3

1: Matches the service, HTTP
2: Gets the httpd used, no matter where the server: line appears in the
   header, and irregardless of the presence/ordering of other http headers.
3: Some unique string that confirms the device branding.

I can't really think of any cases where it's necessary to have
something like .*\r\n.* in a match line with an s modifier.

Warning: Hit PCRE_ERROR_MATCHLIMIT when probing for service http with the regex '^HTTP/1\.0 \d\d\d .*\n.*Server: 
ADSM_HTTP/([\d.]+)\r?\n.*<TITLE>\nServer Administration\n</TITLE>.*<META NAME=\"IBMproductVersion\" 
CONTENT=\"([\d.]+)\">.*<TITLE>\nAdministrator Login\n</TITLE>.*Storage Management Server for Windows'

The issue is in the construction of the match over/poorly using the
greedy quantifier ".*" as in:

"HTTP/1\.0 \d\d\d .*\n.*Server:"

The problem arises when matching against services that have a large
number of partial matches between the .* constructs that force the
engine to backtrack too much while trying to match.

Like I said, I try to avoid matches like that but I think I know some
ways they can occur:

1) A match line originally has no s modifier but later was changed to
   have an s modifier without removing problematic sections of the
   match like .*\n.*

2) When constructing an s match line, someone had a more unique
   section like .*X-Unique-header: blahblah\r\n.* but found that
   the X-Unique-header: wasn't always there and removed the header
   part but left the line terminator .*\r\n.* in the final version.

In all the cases I've run into this issue I've been able to fix the
match by using atomic grouping and lazy quantification.

I think we can make the following substitution on all s modifier
match lines (untested):


The resulting match lines will match strict supersets of the previous
match lines' matches (meaning anything that used to match will still
match plus at least 1 more, newline replaced with empty string) and I
don't think these segments add any important value to the matching
process. There may be unusual cases I'm not considering at the moment
though, perhaps very tough to match services whose only identifying
characteristics are the order and count of their newlines, so I think
these should be processed on a case-by-case basis as you appear to be doing.

Here is a
match diff:

* -match http m|^HTTP/1\.0 \d\d\d .*\n.*Server: ADSM_HTTP/([\d.]+)\r?\n.*<TITLE>\nServer ...snip...
* +match http m|^HTTP/1\.0 \d\d\d (?>.*?\n).*Server: ADSM_HTTP/([\d.]+)\r?\n.*<TITLE>\nServer ...snip...

What about just replacing .*\n.* with .* ?

But remember that in non s modifier match lines it is very important
to keep these segments as is to ensure it still matches.

Rather than fix the handful of these that happen to come up in my
scans, I got to thinking about how to recognize one of the patterns
that makes these problems.  Essentially, any simple string between
two .* clauses that can appear in many places in output can cause
excessive backtracking.  This command will find a list of candidates
for this "bad" pattern:

$ cat nmap-service-probes | perl -ne 'print $1, "\n" if ($_ =~ /((?!<\\)\.\*[^.*]{0,10}\.\*)/)'

In looking through that list, it seems that \r\n and variations on it
are the most common problem construction we have:

$ cat nmap-service-probes | perl -ne 'print $_ if ($_ =~ m/(?!<\\)\.\*((\\r)?\\n)+\.\*/)'

We do have one ".*.*":

Thanks for doing this. Artifacts like ".*.*" are especially embarassing
(though it's possible this case is optimised out by PCRE).

This isn't going to fix ALL of our MATCHLIMIT problems but it should go
a long way towards making the problem better.

Agreed. Please let me/the list know if you notice any other patterns
in use that cause this warning to be generated.



Attachment: _bin

Sent through the nmap-dev mailing list
Archived at http://SecLists.Org

  By Date           By Thread  

Current thread:
[ Nmap | Sec Tools | Mailing Lists | Site News | About/Contact | Advertising | Privacy ]