tcpdump mailing list archives

Re: Flush OS buffer before termination

From: Garri Djavadyan <g.djavadyan () gmail com>
Date: Mon, 21 Oct 2024 00:33:09 +0200

On Sun, 2024-10-20 at 10:27 -0700, Guy Harris wrote:

On Oct 20, 2024, at 2:57 AM, Garri Djavadyan <g.djavadyan () gmail com>
wrote:

I have to use a very big buffer with a very slow storage, much
slower
than the rate of coming packets received by the filter, and it
is
preferred not to lose a single packet after initiating
termination
the
process.


What do you mean by "with a very slow storage"?  You can set the
size
with -B, but that just tells the capture mechanism in the kernel
how
big a buffer to allocate.  It's not as if it tells it to be
stored in
some slower form of memory.


Let me show an example. To demonstrate the issue, I am generating
2MB/s
stream of dummy packets:

[src]# pv -L 2M /dev/zero | dd bs=1472 > /dev/udp/192.168.0.1/12345


and dumping them to a storage, with cgroup-v2-restricted write
speed of
1MB/s:

[dst]# lsblk /dev/loop0
NAME  MAJ:MIN RM  SIZE RO TYPE MOUNTPOINTS
loop0   7:0    0  3.9G  0 loop /mnt/test

[dst]# cat /sys/fs/cgroup/test/io.max
7:0 rbps=max wbps=1024000 riops=max wiops=max


To temporarily avoid kernel-level drops,


Emphasis on *temporarily* - 2MB/s worth of packet data can only be
saved in its entirety if you have 2MB/s or greater write speed.


That is right. However, it also depends on how long one needs to
mediate mismatching rates using a large input buffer. For example, with
a 2GB input buffer and 1MB/s rate difference, one could safely be
filling the buffer for more than half an hour. Safe buffer draining
would help a lot in such situations.

it is clearly seen that the input buffer is being filled at 1MB/s
rate
(the diff between the generated traffic rate (2MB/s) and the
writing
speed of the storage (1MB/s):

tcpdump: 0 packets captured, 0 packets received by filter, 0
packets
dropped by kernel
tcpdump: 218 packets captured, 715 packets received by filter, 0
packets dropped by kernel


On all platforms, "packets captured" means "packets read from libpcap
and written to the capture file".

On Linux, "packets received by filter" means "packets that passed the
filter" (rather than "packets that were run through the filter,
whether or not they passed the filter", which is what it means on
*BSD/macOS/Solaris 11/AIX; unfortunately, you can't get the latter
value from Linux and can't get the former value from BSD, so that
value *can't* be made to mean the same thing on all platforms).  It
includes packets that passed the filter but could not be added to the
buffer because the buffer was full.

On Linux, "packets dropped by kernel" means "packets that passed he
filter but could not be added to the buffer because the buffer was
full".

(The pcap_stats man page has an entire paragraph devoted to giving
the message that the meaning of the statistics differs between
platforms.)

I.e., when tcpdump exits, the difference, on Linux, between "packets
received by filter" and "packets captured" is, indeed, "packets
dropped because tcpdump exited without draining the packet buffer". 
(On *BSD/macOS/Solaris 11/AIX, the latter value cannot be determined,
as per the above.)

There are a few options to overcome the problem. For example,
by dumping packets to the memory storage first (e.g. /dev/shm)


Presumably meaning you specified "-w /dev/shm" or something such
as
that?

If so, how does that make a difference?


I mean I can first dump packets to the lightning-fast RAM storage
and
after being done with the capturing part, copy the dump to the slow
storage.


I.e., it means that, when you signal tcpdump to exit, it's not as far
behind the capture mechanism with regards to writing to the capture
file, because it's stalling less waiting for write() calls to finish
(if the write rate limitation you mention limits the rate at which
write() calls can push data to the file descriptor), so the "packets
captured" count is larger.


Exactly.

I see. Thank you so much for the explanation.

Do you think this case can justify feature requests both for
libpcap
and tcpdump on github?


Yes, as it means that tcpdump (and, potentially, other programs such
as Wireshark) can write out *all* packets received before being told
to stop capturing.

The implementations for various platforms would probably have to 1)
set a "drop all packets" filter on the capture device, 2) possibly
put the capture device in non-blocking mode (as there's no point in
blocking, as no more packets will be seen), and 3) cause the packet
processing loop in libpcap to quit as soon as  it finds that there
are no more packets available to read.  For programs using
pcap_loop(), that should be transparent; for programs using
pcap_dispatch(), they would have to treat a return value of 0, if
they've put the capture device in "draining mode", as meaning "done"
rather than "the packet buffer timeout expired and no packets were
provided, keep capturing".

tcpdump uses pcap_loop(), so it'd only have to be changed to use the
new "stop capturing" API.


Thank you for sharing your thoughts on this. It is good to know that it
is feasible to implement. I will open a feature request for libpcap for
now.

Guy, thank you so much for all your comments. It is much appreciated.

Regards,
Garri
_______________________________________________
tcpdump-workers mailing list -- tcpdump-workers () lists tcpdump org
To unsubscribe send an email to tcpdump-workers-leave () lists tcpdump org
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s

Current thread:

Flush OS buffer before termination Garri Djavadyan (Oct 19)
- Re: Flush OS buffer before termination Guy Harris (Oct 19)
  - Re: Flush OS buffer before termination Garri Djavadyan (Oct 20)
    - Re: Flush OS buffer before termination Guy Harris (Oct 20)
    - Re: Flush OS buffer before termination Garri Djavadyan (Oct 20)
    - Re: Flush OS buffer before termination Guy Harris (Oct 20)
    - Re: Flush OS buffer before termination Garri Djavadyan (Oct 20)