Stanislav Fomichev wrote: > This is an RFC for the alternative approach suggested by Martin and > Jakub. I've tried to CC most of the people from the original discussion, > feel free to add more if you think I've missed somebody. > > Summary: > - add new BPF_F_XDP_HAS_METADATA program flag and abuse > attr->prog_ifindex to pass target device ifindex at load time > - at load time, find appropriate ndo_unroll_kfunc and call > it to unroll/inline kfuncs; kfuncs have the default "safe" > implementation if unrolling is not supported by a particular > device > - rewrite xskxceiver test to use C bpf program and extend > it to export rx_timestamp (plus add rx timestamp to veth driver) > > I've intentionally kept it small and hacky to see whether the approach is > workable or not. Hi, I need RX timestamps now as well so was going to work on some code next week as well. My plan was to simply put a kptr to the rx descriptor in the xdp buffer. If I can read the rx descriptor I can read the timestamp, the rxhash and any other metadata the NIC has completed. All the drivers I've looked at stash the data here. I'll inline pro/cons compared to this below as I see it. > > Pros: > - we avoid BTF complexity; the BPF programs themselves are now responsible > for agreeing on the metadata layout with the AF_XDP consumer Same no BTF is needed in kernel side. Userspace and BPF progs get to sort it out. > - the metadata is free if not used Same. > - the metadata should, in theory, be cheap if used; kfuncs should be > unrolled to the same code as if the metadata was pre-populated and > passed with a BTF id Same its just a kptr at this point. Also one more advantage would be ability to read the data without copying it. > - it's not all or nothing; users can use small subset of metadata which > is more efficient than the BTF id approach where all metadata has to be > exposed for every frame (and selectively consumed by the users) Same. > > Cons: > - forwarding has to be handled explicitly; the BPF programs have to > agree on the metadata layout (IOW, the forwarding program > has to be aware of the final AF_XDP consumer metadata layout) Same although IMO this is a PRO. You only get the bytes you need and care about and can also augment it with extra good stuff so calculation only happen once. > - TX picture is not clear; but it's not clear with BTF ids as well; > I think we've agreed that just reusing whatever we have at RX > won't fly at TX; seems like TX XDP program might be the answer > here? (with a set of another tx kfuncs to "expose" bpf/af_xdp metatata > back into the kernel) Agree TX is not addressed. A bit of extra commentary. By exposing the raw kptr to the rx descriptor we don't need driver writers to do anything. And can easily support all the drivers out the gate with simple one or two line changes. This pushes the interesting parts into userspace and then BPF writers get to do the work without bother driver folks and also if its not done today it doesn't matter because user space can come along and make it work later. So no scattered kernel dependencies which I really would like to avoid here. Its actually very painful to have to support clusters with N kernels and M devices if they have different features. Doable but annoying and much nicer if we just say 6.2 has support for kptr rx descriptor reading and all XDP drivers support it. So timestamp, rxhash work across the board. To find the offset of fields (rxhash, timestamp) you can use standard BTF relocations we have all this machinery built up already for all the other structs we read, net_devices, task structs, inodes, ... so its not a big hurdle at all IMO. We can add userspace libs if folks really care, but its just a read so I'm not even sure that is helpful. I think its nicer than having kfuncs that need to be written everywhere. My $.02 although I'll poke around with below some as well. Feel free to just hang tight until I have some code at the moment I have intel, mellanox drivers that I would want to support. > > Cc: Martin KaFai Lau <martin.lau@xxxxxxxxx> > Cc: Jakub Kicinski <kuba@xxxxxxxxxx> > Cc: Willem de Bruijn <willemb@xxxxxxxxxx> > Cc: Jesper Dangaard Brouer <brouer@xxxxxxxxxx> > Cc: Anatoly Burakov <anatoly.burakov@xxxxxxxxx> > Cc: Alexander Lobakin <alexandr.lobakin@xxxxxxxxx> > Cc: Magnus Karlsson <magnus.karlsson@xxxxxxxxx> > Cc: Maryam Tahhan <mtahhan@xxxxxxxxxx> > Cc: xdp-hints@xxxxxxxxxxxxxxx > Cc: netdev@xxxxxxxxxxxxxxx > > Stanislav Fomichev (5): > bpf: Support inlined/unrolled kfuncs for xdp metadata > veth: Support rx timestamp metadata for xdp > libbpf: Pass prog_ifindex via bpf_object_open_opts > selftests/bpf: Convert xskxceiver to use custom program > selftests/bpf: Test rx_timestamp metadata in xskxceiver > > drivers/net/veth.c | 31 +++++ > include/linux/bpf.h | 1 + > include/linux/btf.h | 1 + > include/linux/btf_ids.h | 4 + > include/linux/netdevice.h | 3 + > include/net/xdp.h | 22 ++++ > include/uapi/linux/bpf.h | 5 + > kernel/bpf/syscall.c | 28 ++++- > kernel/bpf/verifier.c | 60 +++++++++ > net/core/dev.c | 7 ++ > net/core/xdp.c | 28 +++++ > tools/include/uapi/linux/bpf.h | 5 + > tools/lib/bpf/libbpf.c | 1 + > tools/lib/bpf/libbpf.h | 6 +- > tools/testing/selftests/bpf/Makefile | 1 + > .../testing/selftests/bpf/progs/xskxceiver.c | 43 +++++++ > tools/testing/selftests/bpf/xskxceiver.c | 119 +++++++++++++++--- > tools/testing/selftests/bpf/xskxceiver.h | 5 +- > 18 files changed, 348 insertions(+), 22 deletions(-) > create mode 100644 tools/testing/selftests/bpf/progs/xskxceiver.c > > -- > 2.38.1.273.g43a17bfeac-goog >