Re: [tcpdump-workers] Performance impact with multiple pcap handlers on Linux

Linus Lüssing <linus.luessing@xxxxxxxxx> · Wed, 23 Dec 2020 00:31:43 +0100

On Tue, Dec 22, 2020 at 02:28:17PM -0800, Guy Harris wrote:
> On Dec 22, 2020, at 2:05 PM, Linus Lüssing via tcpdump-workers <tcpdump-workers@xxxxxxxxxxxxxxxxx> wrote:
> 
> > I was experimenting a bit with migrating from the use of
> > pcap_offline_filter() to pcap_setfilter().
> > 
> > I was a bit surprised that installing for instance 500 pcap
> > handlers
> 
> What is a "pcap handler" in this context?  An open live-capture pcap_t?
> 
> > with a BPF rule "arp" via pcap_setfilter() reduced
> > the TCP performance of iperf3 over veth interfaces from 73.8 Gbits/sec
> > to 5.39 Gbits/sec. Using only one or even five handlers seemed
> > fine (71.7 Gbits/sec and 70.3 Gbits/sec).
> > 
> > Is that expected?
> > 
> > Full test setup description and more detailed results can be found
> > here: https://github.com/lemoer/bpfcountd/pull/8
> 
> That talks about numbers of "rules" rather than "handlers".  It does speak of "pcap *handles*"; did you mean "handles", rather than "handlers"?

Sorry, right, I ment pcap handles everywhere.

So far the bpfcountd code uses one pcap_t handle created via one
pcap_open_live() call. And then for each received packet iterates
over a list of user specified filter expressions and applies
pcap_offline_filter() for each filter to the packet. And then
counts the number of packets and packet bytes that matched each
filter expression.

> 
> Do those "rules" correspond to items in the filter expression that's compiled into BPF code, or do they correspond to open `pcap_t`s?  If a "rule" corresponds to a "handle", then does it correspond to an open pcap_t?
> 
> Or do they correspond to an entire filter expression?

What I ment with "rule" was an entire filter expression. The user
specifies a list of filter expressions. And bpfcountd counts how
many packets and the sum of packet bytes which matched each filter
expression.

Basically we want to do live measurements of the overhead of the mesh
routing protocol and measure and dissect the layer 2 broadcast traffic.
To measure how much ARP, DHCP, ICMPv6 NS/NA/RS/RA, MDNS, LLDP overhead
etc. we have.

> 
> Does this change involve replacing a *single* pcap_t, on which you use pcap_offline_filter() with multiple different filter expressions, with *multiple* pcap_t's, with each one having a separate filter, set with pcap_setfilter()?  If so, note that this involves replacing a single file descriptor with multiple file descriptors, and replacing a single ring buffer into which the kernel puts captured packets with multiple ring buffers into *each* of which the kernel puts captured packets, which increases the amount of work the kernel does.

Correct. I tried to replace the single pcap_t with multiple
pcap_t's, one for each filter expression the user specified. And
then tried using pcap_setfilter() on each pcap_t and removing the
filtering in userspace via pcap_offline_filter().

The idea was to improve performance by A): Avoiding to copy the
actual packet data to userspace. And B) I was hoping that
traffic which does not match any filter expression would not be
impacted by running bpfcountd / libpcap that much anymore.

Right, for matching, captured traffic the work for the kernel is
probably more, with mulitple ring buffers as you described. But
we only want to match and measure and dissect broadcast and mesh
protocol traffic with bpfcountd. For which we are expecting traffic
rates of about 100 to 500 kbits/s which are supposed to match.

Unicast IP traffic at much higher rates will not be matched and the
idea/hope for these changes was to leave the IP unicast performance
mostly untampered while still measuring and dissecting the other,
non IP unicast traffic.

> 
> > PS: And I was also surprised that there seems to be a limit of
> > only 510 pcap handlers on Linux.
> 
> "handlers" or "handles"?
> 
> If it's "handles", as in "pcap_t's open for live capture", and if you're switching from a single pcap_t to multiple pcap_t's, that means using more file descriptors (so that you may eventually run out) and more ring buffers (so that the kernel may eventually say "you're tying up too much wired memory for all those ring buffers").
> 
> In either of those cases, the attempt to open a pcap_t will eventually get an error; what is the error that's reported?

pcap_activate() returns "socket: Too many open files" for the
511th pcap_t and pcap_activate() call.

Ah! "ulimit -n" as root returns "1024" for me. Increasing that
limit helps, I can have more pcap_t handles then, thanks!

(as a non-root user "ulimit -n" returns 1048576 - interesting that
an unprivileged user can open more sockets than root by default,
didn't expect that)