Re: [PATCH net-next v2] samples/bpf: fixup some tools to be able to support xdp multibuffer

Toke Høiland-Jørgensen <toke@xxxxxxxxxx> · Fri, 06 Jan 2023 18:54:37 +0100

>>> So my main concern would be that if we "allow" this, the only way to
>>> write an interoperable XDP program will be to use bpf_xdp_load_bytes()
>>> for every packet access. Which will be slower than DPA, so we may end up
>>> inadvertently slowing down all of the XDP ecosystem, because no one is
>>> going to bother with writing two versions of their programs. Whereas if
>>> you can rely on packet headers always being in the linear part, you can
>>> write a lot of the "look at headers and make a decision" type programs
>>> using just DPA, and they'll work for multibuf as well.
>>
>> The question I would have is what is really the 'slow down' for
>> bpf_xdp_load_bytes() vs DPA?  I know you and Jesper can tell me how many
>> instructions each use. :)
>
> I can try running some benchmarks to compare the two, sure!

Okay, ran a simple test: a program that just parses the IP header, then
drops the packet. Results as follows:

Baseline (don't touch data):    26.5 Mpps / 37.8 ns/pkt
Touch data (ethernet hdr):      25.0 Mpps / 40.0 ns/pkt
Parse IP (DPA):                 24.1 Mpps / 41.5 ns/pkt
Parse IP (bpf_xdp_load_bytes):  15.3 Mpps / 65.3 ns/pkt

So 2.2 ns of overhead from reading the packet data, another 1.5 ns from
the parsing logic, and a whopping 23.8 ns extra from switching to
bpf_xdp_load_bytes(). This is with two calls to bpf_xdp_load_bytes(),
one to get the Ethernet header, and another to get the IP header.
Dropping one of them also drops the overhead in half, so it seems to fit
with ~12 ns of overhead from a single call to bpf_xdp_load_bytes().

I pushed the code I used for testing here, in case someone else wants to
play around with it:

https://github.com/xdp-project/xdp-tools/tree/xdp-load-bytes

It's part of the 'xdp-bench' utility. Run it as:

./xdp-bench drop <iface> -p parse-ip

for DPA parsing and

./xdp-bench drop <iface> -p parse-ip -l

to use bpf_xdp_load_bytes().

-Toke