Home page logo

nmap-dev logo Nmap Development mailing list archives

[RFC] PCRE MATCHLIMIT and the use of greedy quantifiers (-sV scans)
From: Brandon Enright <bmenrigh () ucsd edu>
Date: Thu, 2 Apr 2009 23:09:57 +0000

Hash: SHA1

Doug, Developers,

In scanning the thousands of services on our network I regularly run
into the following error:

Warning: Hit PCRE_ERROR_MATCHLIMIT when probing for service http with the regex '^HTTP/1\.0 \d\d\d 
.*\r\n.*\r\n\r\n.*\t<title>Strongdc\+\+ webserver - Login Page</title>\t'

There are a number match lines that trigger this, here are a couple more

Warning: Hit PCRE_ERROR_MATCHLIMIT when probing for service http with the regex '^HTTP/1\.0 \d\d\d .*\n.*Server: 
ADSM_HTTP/([\d.]+)\nContent-type: text/html\n\n<HEAD>\n<TITLE>\nServer Administration\n</TITLE>\n\n<META 
NAME=\"IBMproduct\" CONTENT=\"ADSM\">\n<META NAME=\"IBMproductVersion\" CONTENT=\"([\d.]+)\">.*Storage Management 
Server for AIX'

Warning: Hit PCRE_ERROR_MATCHLIMIT when probing for service http with the regex '^HTTP/1\.0 \d\d\d .*\n.*Server: 
ADSM_HTTP/([\d.]+)\r?\n.*<TITLE>\nServer Administration\n</TITLE>.*<META NAME=\"IBMproductVersion\" 
CONTENT=\"([\d.]+)\">.*<TITLE>\nAdministrator Login\n</TITLE>.*Storage Management Server for Windows'

The issue is in the construction of the match over/poorly using the
greedy quantifier ".*" as in:

"HTTP/1\.0 \d\d\d .*\n.*Server:"

The problem arises when matching against services that have a large
number of partial matches between the .* constructs that force the
engine to backtrack too much while trying to match.

In all the cases I've run into this issue I've been able to fix the
match by using atomic grouping and lazy quantification.  Here is a
match diff:

* -match http m|^HTTP/1\.0 \d\d\d .*\n.*Server: ADSM_HTTP/([\d.]+)\r?\n.*<TITLE>\nServer ...snip...
* +match http m|^HTTP/1\.0 \d\d\d (?>.*?\n).*Server: ADSM_HTTP/([\d.]+)\r?\n.*<TITLE>\nServer ...snip...

By changing ".*\n.*" to "(?>.*?\n).*" the match will still work
properly but the atomic group can not be backtracked on to cause a
MATCHLIMIT error.  Note that the first greed quantifier must be made
lazy or there are cases where it could consume a \n to farther in the
string and won't match properly because it needs to be backtracked and
can't be.

Rather than fix the handful of these that happen to come up in my
scans, I got to thinking about how to recognize one of the patterns
that makes these problems.  Essentially, any simple string between
two .* clauses that can appear in many places in output can cause
excessive backtracking.  This command will find a list of candidates
for this "bad" pattern:

$ cat nmap-service-probes | perl -ne 'print $1, "\n" if ($_ =~ /((?!<\\)\.\*[^.*]{0,10}\.\*)/)'

In looking through that list, it seems that \r\n and variations on it
are the most common problem construction we have:

$ cat nmap-service-probes | perl -ne 'print $_ if ($_ =~ m/(?!<\\)\.\*((\\r)?\\n)+\.\*/)'

We do have one ".*.*":

match http m|^HTTP/1\.0 200 OK\r\n.*<TITLE>Main Menu \[[\w-_.]+\]</TITLE>.*.*<A title=\"Return to Main Menu\" 
HREF=\"/\">TivoWebPlus</A>|s p/TiveWebPlus Project httpd/ d/media device/

The fix I propose is to change all cases of ".*<newline variations>.*"
into "(?>.*?<newline variation>).*"

This isn't going to fix ALL of our MATCHLIMIT problems but it should go
a long way towards making the problem better.

Comments welcome,


Version: GnuPG v2.0.10 (GNU/Linux)


Sent through the nmap-dev mailing list
Archived at http://SecLists.Org

  By Date           By Thread  

Current thread:
[ Nmap | Sec Tools | Mailing Lists | Site News | About/Contact | Advertising | Privacy ]