Nmap Development mailing list archives

Re: Writing high-performance npcap application

From: Daniel Miller <bonsaiviking () gmail com>
Date: Fri, 29 Apr 2022 13:11:05 -0500

Jan,

Thanks for your interest in Npcap! I'll try to answer questions inline
below.

On Wed, Apr 27, 2022 at 1:21 PM Jan Danielsson <jan.m.danielsson () gmail com>
wrote:

    [The npcap page said it was ok to use nmap mailing list for npcap
related questions.  If there's a more appropriate forum, please point me
to it.]


Questions can also be posted as Issues on our Github page, but the nmap-dev
mailing list is also publicly archived, so it works well for this type of
discussion.

    At first I got a pretty abysmal performance because I used
pcap_sendpacket().



I believe most of the performance difference there would be because Npcap
so far does not support a "nonblocking mode" for send operations. This is
inherited from WinPcap, and sounds like an awesome place to start with
performance enhancements!

This was expected, so I implemented sendqueue
support into the pcap crate, and used that instead.  This however did
not work -- I kept running out of memory.  This was the first minor
stumbling block:  I thought that one could reuse a sendqueue buffer
(i.e. it implicitly gets reset after a transmission), but that does not
seem to be the case?



The sendqueue buffer is not reset, partly because comparing the
queue's `len` to the return value of pcap_sendqueue_transmit() is how you
can know if some of the queue was not transmitted due to error (check
pcap_geterr() if so). You *can* reuse the memory, just set the `len` to 0
before calling pcap_sendqueue_queue() again. In fact, the code behind
pcap_sendqueue_queue() is pretty simple (just some offset math and 2 memcpy
operations), so if you have a way to do it faster, feel free to construct
your own buffer and attach it to the pcap_send_queue structure.

  When I rewrote the code to allocate/free a new
sendqueue for each batch, then it worked.  And I got _really_ good
performance, as well.  Just to be clear:  Have I understood it correctly
that the sendqueue does not autoreset after transmission, and I need to
allocate a new sendqueue for each batch?


It does not reset, but allocating a new one each time is not necessary;
just reset the queue's len to 0.

    However, when I ran a long test, I got an error which says that some

resources were exhausted.  I obviously need to double-check that it's
actually releasing the sendqueue on each iteration -- but I'm pretty
sure it does.  However, I'm sending *a lot* of packets.  Is there any
known resource leak in npcap when sending very many packets using
sendqueues?


There's not a known resource leak. We'd want to start by determining if the
resource exhaustion is on the user-side (your app, Packet.dll, and
wpcap.dll) or on the driver side (npcap.sys). If the error was reported via
pcap_geterr(), then it almost certainly came from the driver. To diagnose,
we need to identify what condition triggers the error:

1. A particular pcap_send_queue will reliably trigger the error, even if it
is the first one sent.
2. The error triggers only after several different calls to
pcap_sendqueue_transmit()

Then we can identify whether it is the amount of packets in total or the
rate of packet transmission that is the issue. If it is the rate of
transmission, then adding a timestamp to each pcap_pkthdr and using the
sync parameter will not trigger the error. I'm not suggesting this as a
workaround necessarily, but more as a diagnostic tool. Some resources are
used while the packet is being sent asynchronously in the driver, and if we
overcommit those resources, the driver will end up returning
STATUS_INSUFFICIENT_RESOURCES.

    The receiver is in much worse shape.  It will receive a number of
packets (a few thousand, IIRC) and then simply stop receiving new packets.

    Are there any special considerations one must take into account when
trying to receive packets at a high rate?  At first I thought the
capture buffer may be overflowing (it was set at 1MB), but when I
increased it to 16MB it stopped at roughly the same number of packets.
(The application does not try to store any data on the receiver -- it
just makes receives the packet, checks that its index matches the
expected index, and then throws away the packet).


This is a bit more concerning, since receiving packets gets a lot more
attention and the code is already very well tested at high rates. However,
we should rule out a few things to be sure.

When you say "stop receiving new packets," do you mean that you start
getting errors when you call pcap_* functions, or do you mean the a call to
pcap_next() or pcap_next_ex() does not return and/or your callback to
pcap_dispatch() or pcap_loop() is not called? Npcap has a few extra
configuration parameters beyond the standard ones for libpcap, and in some
cases these can mean that packets have arrived but you can't get them from
the kernel because there is not enough data:

1. If the read timeout is 0 (default for pcap_create(), to_ms parameter to
pcap_open_live()), then the application will wait "forever" for the read
event to be signaled by the driver before issuing a ReadFile() to fetch
packets.
2. If the MinToCopy value has been set (default 16KB), then the driver will
not signal the read event until at least that much data has been captured.
This is intended to reduce overhead of frequent calls to ReadFile().

So if you have a packet filter set for a very specific type of traffic, and
that traffic stops when there is less than 16KB (default MinToCopy) in the
kernel buffer, a read timeout of 0 means you will never get those last few
packets because the PacketReadPacket() function in Packet.dll is doing a
WaitForSingleObject(ReadEvent, INFINITE), and the driver will never signal
the event. There are a few solutions:

1. Use immediate mode (pcap_set_immediate_mode()) to get packets as soon as
they come in (for Npcap, this is implemented as pcap_setmintocopy(0)). This
may have negative impact on performance for the application as a whole,
including on other platforms. Setting the MinToCopy value directly is also
supported via pcap_setmintocopy(), but it may result in differences in
behavior between platforms, since it is a Npcap/WinPcap-only setting.
2. Set a positive timeout with pcap_set_timeout(). This guarantees (for
Npcap) that you will get any waiting packets within the timeout period,
even if they are less than MinToCopy. Not all libpcap-supported platforms
support a read timeout.
3. Put the pcap handle in nonblocking mode with pcap_setnonblock(). This
will cause Npcap to ignore the read event entirely and issue a ReadFile()
no matter what. In this case, pcap_dispatch() and pcap_next() may return
either 0 or PCAP_ERROR if there are no packets available. You can use
pcap_getevent() to get a handle to the event that your application can wait
on directly, in case you want to have other logic around the wait, such as
using WaitForMultipleObjects if you have several capture handles or other
I/O sources to wait for.

I hope this has helped a bit. It has definitely given me a lot of good
stuff to think about for the future of Npcap!

Dan

_______________________________________________
Sent through the dev mailing list
https://nmap.org/mailman/listinfo/dev
Archived at https://seclists.org/nmap-dev/

Current thread:

Writing high-performance npcap application Jan Danielsson (Apr 27)
- Re: Writing high-performance npcap application Daniel Miller (Apr 29)