RE: [RFC bpf-next 0/5] xdp: hints via kfuncs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Stanislav Fomichev wrote:
> This is an RFC for the alternative approach suggested by Martin and
> Jakub. I've tried to CC most of the people from the original discussion,
> feel free to add more if you think I've missed somebody.
> 
> Summary:
> - add new BPF_F_XDP_HAS_METADATA program flag and abuse
>   attr->prog_ifindex to pass target device ifindex at load time
> - at load time, find appropriate ndo_unroll_kfunc and call
>   it to unroll/inline kfuncs; kfuncs have the default "safe"
>   implementation if unrolling is not supported by a particular
>   device
> - rewrite xskxceiver test to use C bpf program and extend
>   it to export rx_timestamp (plus add rx timestamp to veth driver)
> 
> I've intentionally kept it small and hacky to see whether the approach is
> workable or not.

Hi,

I need RX timestamps now as well so was going to work on some code
next week as well.

My plan was to simply put a kptr to the rx descriptor in the xdp
buffer. If I can read the rx descriptor I can read the timestamp,
the rxhash and any other metadata the NIC has completed. All the
drivers I've looked at stash the data here.

I'll inline pro/cons compared to this below as I see it.

> 
> Pros:
> - we avoid BTF complexity; the BPF programs themselves are now responsible
>   for agreeing on the metadata layout with the AF_XDP consumer

Same no BTF is needed in kernel side. Userspace and BPF progs get
to sort it out.

> - the metadata is free if not used

Same.

> - the metadata should, in theory, be cheap if used; kfuncs should be
>   unrolled to the same code as if the metadata was pre-populated and
>   passed with a BTF id

Same its just a kptr at this point. Also one more advantage would
be ability to read the data without copying it.

> - it's not all or nothing; users can use small subset of metadata which
>   is more efficient than the BTF id approach where all metadata has to be
>   exposed for every frame (and selectively consumed by the users)

Same.

> 
> Cons:
> - forwarding has to be handled explicitly; the BPF programs have to
>   agree on the metadata layout (IOW, the forwarding program
>   has to be aware of the final AF_XDP consumer metadata layout)

Same although IMO this is a PRO. You only get the bytes you need
and care about and can also augment it with extra good stuff so
calculation only happen once.

> - TX picture is not clear; but it's not clear with BTF ids as well;
>   I think we've agreed that just reusing whatever we have at RX
>   won't fly at TX; seems like TX XDP program might be the answer
>   here? (with a set of another tx kfuncs to "expose" bpf/af_xdp metatata
>   back into the kernel)


Agree TX is not addressed.


A bit of extra commentary. By exposing the raw kptr to the rx
descriptor we don't need driver writers to do anything. And
can easily support all the drivers out the gate with simple
one or two line changes. This pushes the interesting parts
into userspace and then BPF writers get to do the work without
bother driver folks and also if its not done today it doesn't
matter because user space can come along and make it work
later. So no scattered kernel dependencies which I really
would like to avoid here. Its actually very painful to have
to support clusters with N kernels and M devices if they
have different features. Doable but annoying and much nicer
if we just say 6.2 has support for kptr rx descriptor reading
and all XDP drivers support it. So timestamp, rxhash work
across the board.

To find the offset of fields (rxhash, timestamp) you can use
standard BTF relocations we have all this machinery built up
already for all the other structs we read, net_devices, task
structs, inodes, ... so its not a big hurdle at all IMO. We
can add userspace libs if folks really care, but its just
a read so I'm not even sure that is helpful.

I think its nicer than having kfuncs that need to be written
everywhere. My $.02 although I'll poke around with below
some as well. Feel free to just hang tight until I have some
code at the moment I have intel, mellanox drivers that I
would want to support.

> 
> Cc: Martin KaFai Lau <martin.lau@xxxxxxxxx>
> Cc: Jakub Kicinski <kuba@xxxxxxxxxx>
> Cc: Willem de Bruijn <willemb@xxxxxxxxxx>
> Cc: Jesper Dangaard Brouer <brouer@xxxxxxxxxx>
> Cc: Anatoly Burakov <anatoly.burakov@xxxxxxxxx>
> Cc: Alexander Lobakin <alexandr.lobakin@xxxxxxxxx>
> Cc: Magnus Karlsson <magnus.karlsson@xxxxxxxxx>
> Cc: Maryam Tahhan <mtahhan@xxxxxxxxxx>
> Cc: xdp-hints@xxxxxxxxxxxxxxx
> Cc: netdev@xxxxxxxxxxxxxxx
> 
> Stanislav Fomichev (5):
>   bpf: Support inlined/unrolled kfuncs for xdp metadata
>   veth: Support rx timestamp metadata for xdp
>   libbpf: Pass prog_ifindex via bpf_object_open_opts
>   selftests/bpf: Convert xskxceiver to use custom program
>   selftests/bpf: Test rx_timestamp metadata in xskxceiver
> 
>  drivers/net/veth.c                            |  31 +++++
>  include/linux/bpf.h                           |   1 +
>  include/linux/btf.h                           |   1 +
>  include/linux/btf_ids.h                       |   4 +
>  include/linux/netdevice.h                     |   3 +
>  include/net/xdp.h                             |  22 ++++
>  include/uapi/linux/bpf.h                      |   5 +
>  kernel/bpf/syscall.c                          |  28 ++++-
>  kernel/bpf/verifier.c                         |  60 +++++++++
>  net/core/dev.c                                |   7 ++
>  net/core/xdp.c                                |  28 +++++
>  tools/include/uapi/linux/bpf.h                |   5 +
>  tools/lib/bpf/libbpf.c                        |   1 +
>  tools/lib/bpf/libbpf.h                        |   6 +-
>  tools/testing/selftests/bpf/Makefile          |   1 +
>  .../testing/selftests/bpf/progs/xskxceiver.c  |  43 +++++++
>  tools/testing/selftests/bpf/xskxceiver.c      | 119 +++++++++++++++---
>  tools/testing/selftests/bpf/xskxceiver.h      |   5 +-
>  18 files changed, 348 insertions(+), 22 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/progs/xskxceiver.c
> 
> -- 
> 2.38.1.273.g43a17bfeac-goog
> 





[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux