Nmap Development mailing list archives

Re: [BUG] NSE/Nsock filehandle exhaustion


From: Stoiko Ivanov <stoiko () xover htu tuwien ac at>
Date: Thu, 30 Aug 2007 20:35:10 +0200

Hi,

On Tue, Aug 28, 2007 at 12:55:04AM +0000, Brandon Enright wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Developers,

I hate to be reporting a bug without a patch but I haven't been able to
fully track this one down and I'm sure someone here is going to have more
insight into the problem than me.

With the latest NSE implementation compiled from SVN, Nmap runs my machine
out of filehandles when I scan large block of machines at a time.

Here is sample output:

...
SCRIPT ENGINE: Will run ././nmap/current/scripts//ripeQuery.nse against
132.239.74.211
SCRIPT ENGINE: Running scripts.
SCRIPT ENGINE: Runlevel: 1.000000
Initiating SCRIPT ENGINE at 23:49
SCRIPT ENGINE Timing: About 0.00% done
Socket troubles: Too many open files
Socket troubles: Too many open files
...lots of socket trouble errors...
Socket troubles: Too many open files
Segmentation fault


Over 1024 copies of the ripeQuery.nse script were being executed in that
hostgroup.  I went ahead and increased the max filehandle count with ulimit
- -n and /etc/security/limits.conf from 1024 to 10240.  Unfortunately instead
of solving the problem, it hits another:

SCRIPT ENGINE: Will run ././nmap/current/scripts//ripeQuery.nse against
132.239.75.10
SCRIPT ENGINE: Running scripts.
SCRIPT ENGINE: Runlevel: 1.000000
Initiating SCRIPT ENGINE at 23:58
SCRIPT ENGINE Timing: About 0.00% done
nmap: gh_list.c:346: gh_list_remove_elem: Assertion `list->count != 0 ||
(list->first == ((void *)0) && list->last == ((void *)0))' failed. Aborted

For both cases a backtrack would be a great help for chasing down the bug
I personally use gdb for debugging, so I can only provide you with
instructions for gdb:

You can create one by allowing your shell to dump core:
$ ulimit -c unlimited 

and afterwards run gdb on the executable and the core file:
$ gdb ./nmap ./core 

once gdb provides you with a prompt just type
bt full

and you should get the function where the segfault/assertion failure
occured (and by which functions it was called)

I've done some digging and the issue seems to be the number of concurrent
sockets that are being opened.

..snip..
Is it possible that Nmap/NSE is calling socket:connect() more than 1024
times in parallel *before* the parallel scripts get to the socket:close()
call?  This doesn't sound very likely to me.
This is exactly what is happening. NSE-scripts get scheduled through a 
round-robin style algorithm. All script-host combination which will run are
stored in a list. Each time a script yields (i.e. pauses to wait for the
completion of a nsock-event) the next one starts running - once the
nsock-event completed the script is put at the *end* of the list containing
all scripts. 

Maybe a solution to this problem would be to put the scripts which
already got their network-event done in the beginning of the list.
Through this scripts, which already started running would be prefered over
those which had no chance to run at all, and would thus finish execution
(and closing their sockets) sooner.
I've tried this and it seems to work better (although I couldn't reproduce
the assertion failure from your second try) - I'll commit the patch in a
second.

Another solution would be to change the scheduling algorithm to run at most
1024 script-host combinations in one batch (which wouldn't solve the
problem if a script opens more than one socket).


How do I go about troubleshooting this?  I'd like some way of seeing the
number of simultanious scripts being blocked waiting for the connect call
to see if it gets over 1024.
Scripts which wait for network I/O are pushed in a list of waiting scripts 
(currently this is in nse_main.cc, line 280, function: process_mainloop()) 
and once the network I/O request is handled they get pushed at the end of
the running_scripts list (nse_main.cc line 346, 
function: process_waiting2running() ).



Maybe there needs to be some (tunable) cap on the number parallel NSE
scripts or number of sockets allowed open by NSE at a time.  I'm thinking
that nmap.new_socket() could block if the number of connected/open sockets
goes above some threshold.

Please let me know if there is anything I can do to help troubleshoot this
or if I need to clarify anything stated above.
The backtracks would be a great help, maybe you could pass --script-trace
as an option too. 

And of course feedback, wheter the bug-fix solved the issue would be
welcome.

Brandon

cheers 
stoiko


- -- 
Brandon Enright
Network Security Analyst
UCSD Network Operations
bmenrigh () ucsd edu


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.7 (GNU/Linux)

iD8DBQFG03JpqaGPzAsl94IRAvtbAKCV0Q0ejdWuZgcaSB+jfbSkQ8JyigCfb1z0
M2G/h4YS2YlG5tqyymKE6SI=
=XOfS
-----END PGP SIGNATURE-----

_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://SecLists.Org

_______________________________________________
Sent through the nmap-dev mailing list
http://cgi.insecure.org/mailman/listinfo/nmap-dev
Archived at http://SecLists.Org


Current thread: