>>> So my main concern would be that if we "allow" this, the only way to >>> write an interoperable XDP program will be to use bpf_xdp_load_bytes() >>> for every packet access. Which will be slower than DPA, so we may end up >>> inadvertently slowing down all of the XDP ecosystem, because no one is >>> going to bother with writing two versions of their programs. Whereas if >>> you can rely on packet headers always being in the linear part, you can >>> write a lot of the "look at headers and make a decision" type programs >>> using just DPA, and they'll work for multibuf as well. >> >> The question I would have is what is really the 'slow down' for >> bpf_xdp_load_bytes() vs DPA? I know you and Jesper can tell me how many >> instructions each use. :) > > I can try running some benchmarks to compare the two, sure! Okay, ran a simple test: a program that just parses the IP header, then drops the packet. Results as follows: Baseline (don't touch data): 26.5 Mpps / 37.8 ns/pkt Touch data (ethernet hdr): 25.0 Mpps / 40.0 ns/pkt Parse IP (DPA): 24.1 Mpps / 41.5 ns/pkt Parse IP (bpf_xdp_load_bytes): 15.3 Mpps / 65.3 ns/pkt So 2.2 ns of overhead from reading the packet data, another 1.5 ns from the parsing logic, and a whopping 23.8 ns extra from switching to bpf_xdp_load_bytes(). This is with two calls to bpf_xdp_load_bytes(), one to get the Ethernet header, and another to get the IP header. Dropping one of them also drops the overhead in half, so it seems to fit with ~12 ns of overhead from a single call to bpf_xdp_load_bytes(). I pushed the code I used for testing here, in case someone else wants to play around with it: https://github.com/xdp-project/xdp-tools/tree/xdp-load-bytes It's part of the 'xdp-bench' utility. Run it as: ./xdp-bench drop <iface> -p parse-ip for DPA parsing and ./xdp-bench drop <iface> -p parse-ip -l to use bpf_xdp_load_bytes(). -Toke