Re: [xdp-hints] Re: [RFC bpf-next 0/5] xdp: hints via kfuncs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 03/11/2022 01.09, Toke Høiland-Jørgensen wrote:
Stanislav Fomichev <sdf@xxxxxxxxxx> writes:

On Wed, Nov 2, 2022 at 3:02 PM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:

Jesper Dangaard Brouer <jbrouer@xxxxxxxxxx> writes:

On 01/11/2022 18.05, Martin KaFai Lau wrote:
On 10/31/22 6:59 PM, Stanislav Fomichev wrote:
On Mon, Oct 31, 2022 at 3:57 PM Martin KaFai Lau
<martin.lau@xxxxxxxxx> wrote:

On 10/31/22 10:00 AM, Stanislav Fomichev wrote:
2. AF_XDP programs won't be able to access the metadata without
using a
custom XDP program that calls the kfuncs and puts the data into the
metadata area. We could solve this with some code in libxdp,
though; if
this code can be made generic enough (so it just dumps the available
metadata functions from the running kernel at load time), it may be
possible to make it generic enough that it will be forward-compatible
with new versions of the kernel that add new fields, which should
alleviate Florian's concern about keeping things in sync.

Good point. I had to convert to a custom program to use the kfuncs :-(
But your suggestion sounds good; maybe libxdp can accept some extra
info about at which offset the user would like to place the metadata
and the library can generate the required bytecode?

3. It will make it harder to consume the metadata when building
SKBs. I
think the CPUMAP and veth use cases are also quite important, and that
we want metadata to be available for building SKBs in this path. Maybe
this can be resolved by having a convenient kfunc for this that can be
used for programs doing such redirects. E.g., you could just call
xdp_copy_metadata_for_skb() before doing the bpf_redirect, and that
would recursively expand into all the kfunc calls needed to extract
the
metadata supported by the SKB path?

So this xdp_copy_metadata_for_skb will create a metadata layout that

Can the xdp_copy_metadata_for_skb be written as a bpf prog itself?
Not sure where is the best point to specify this prog though.
Somehow during
bpf_xdp_redirect_map?
or this prog belongs to the target cpumap and the xdp prog
redirecting to this
cpumap has to write the meta layout in a way that the cpumap is
expecting?

We're probably interested in triggering it from the places where xdp
frames can eventually be converted into skbs?
So for plain 'return XDP_PASS' and things like bpf_redirect/etc? (IOW,
anything that's not XDP_DROP / AF_XDP redirect).
We can probably make it magically work, and can generate
kernel-digestible metadata whenever data == data_meta, but the
question - should we?
(need to make sure we won't regress any existing cases that are not
relying on the metadata)

Instead of having some kernel-digestible meta data, how about calling
another bpf prog to initialize the skb fields from the meta area after
__xdp_build_skb_from_frame() in the cpumap, so
run_xdp_set_skb_fileds_from_metadata() may be a better name.


I very much like this idea of calling another bpf prog to initialize the
SKB fields from the meta area. (As a reminder, data need to come from
meta area, because at this point the hardware RX-desc is out-of-scope).
I'm onboard with xdp_copy_metadata_for_skb() populating the meta area.

We could invoke this BPF-prog inside __xdp_build_skb_from_frame().

We might need a new BPF_PROG_TYPE_XDP2SKB as this new BPF-prog
run_xdp_set_skb_fields_from_metadata() would need both xdp_buff + SKB as
context inputs. Right?  (Not sure, if this is acceptable with the BPF
maintainers new rules)

The xdp_prog@rx sets the meta data and then redirect.  If the
xdp_prog@rx can also specify a xdp prog to initialize the skb fields
from the meta area, then there is no need to have a kfunc to enforce a
kernel-digestible layout.  Not sure what is a good way to specify this
xdp_prog though...

The challenge of running this (BPF_PROG_TYPE_XDP2SKB) BPF-prog inside
__xdp_build_skb_from_frame() is that it need to know howto decode the
meta area for every device driver or XDP-prog populating this (as veth
and cpumap can get redirected packets from multiple device drivers).

If we have the helper to copy the data "out of" the drivers, why do we
need a second BPF program to copy data to the SKB?


IMHO the second BPF program to populate the SKB is needed to add
flexibility and extensibility.

My end-goal here is to speedup packet parsing.
This BPF-prog should (in time) be able to update skb->transport_header
and skb->network_header.  As I mentioned before, HW RX-hash already tell
us the L3 and L4 protocols and in-most-cases header-len.  Even without
HW-hints, the XDP-prog likely have parsed the packet once. This parse
information is lost today, and redone by netstack. What about storing
this header parse info in meta data and re-use in this new XDP2SKB hook?

The reason for suggesting this BPF-prog to be a callback, associated
with the net_device, were that HW is going to differ on what HW hints
that HW support.  Thus, we can avoid a generic C-function that need to
check for all the possible hints, and instead have a BPF-prog that only
contains the code that is relevant for this net_device.


I.e., the XDP program calls xdp_copy_metadata_for_skb(); this invokes
each of the kfuncs needed for the metadata used by SKBs, all of which
get unrolled. The helper takes the output of these metadata-extracting
kfuncs and stores it "somewhere". This "somewhere" could well be the
metadata area; but in any case, since it's hidden away inside a helper
(or kfunc) from the calling XDP program's PoV, the helper can just stash
all the data in a fixed format, which __xdp_build_skb_from_frame() can
then just read statically. We could even make this format match the
field layout of struct sk_buff, so all we have to do is memcpy a
contiguous chunk of memory when building the SKB.

+1

Sorry, I think this "hiding" layout trick is going in a wrong direction.

Imagine the use-case of cpumap redirect.  The physical device XDP-prog
calls xdp_copy_metadata_for_skb() to extract info from RX-desc, then it
calls redirect into cpumap.  On remote CPU, the xdp_frame is picked up,
and then I want to run another XDP-prog that want to look at these
HW-hints, and then likely call XDP_PASS which creates the SKB, also
using these HW-hints.  I take it, it would not be possible when using
the xdp_copy_metadata_for_skb() helper?


I'm currently doing exactly what you're suggesting (minus matching skb layout):

struct xdp_to_skb_metadata {
   u32 magic; // randomized at boot
   ... skb-consumable-metadata in fixed format
} __randomize_layout;

bpf_xdp_copy_metadata_for_skb() does bpf_xdp_adjust_meta(ctx,
-sizeof(struct xdp_to_skb_metadata)) and then calls a bunch of kfuncs
to fill in the actual data.

Then, at __xdp_build_skb_from_frame time, I'm having a regular kernel
C code that parses that 'struct xdp_to_skb_metadata'.
(To be precise, I'm trying to parse the metadata from
skb_metadata_set; it's called from __xdp_build_skb_from_frame, but not
100% sure that's the right place).
(I also randomize the layout and magic to make sure userspace doesn't
depend on it because nothing stops this packet to be routed into xsk
socket..)

Ah, nice trick with __randomize_layout - I agree we need to do something
to prevent userspace from inadvertently starting to rely on this, and
this seems like a great solution!

Sorry, I disagree where this is going.  Why do all of a sudden want to
prevent userspace (e.g. AF_XDP) from using this data?!?

The hole exercise started with wanting to provide AF_XDP with these
HW-hints. The hints a standard AF_XDP user wants is likely very similar
to what the SKB user wants.  Why do the AF_XDP user need to open code this?

The BTF-ID scheme precisely allows us to expose this layout to
userspace, and at the same time have freedom to change this in kernel
space, as userspace must decode the BTF-layout before reading this.
I was hoping xdp_copy_metadata_for_skb() could simply use the BTF-ID
scheme, with the BTF-ID of struct xdp_hints_rx_common, is to too much to
ask for?  You can just consider the BTF-ID as the magic number, as it
will be more-or-less random per kernel build (and module load order).

Look forward to seeing what the whole thing looks like in a more
complete form :)

I'm sort of on-board with the kfuncs and unroll-tricks, if I can see
some driver code that handles the issues of getting HW setup state
exposed needed to decode the RX-desc format.

I sense that I myself, haven't been good enough to explain/convey the
BTF-ID scheme.  Next week, I will code some examples that demo how
BTF-IDs can be used from BPF-progs, even as a communication channel
between different BPF-progs (e.g. drv XDP-prog -> cpumap XDP-prog ->
TC-BPF).

--Jesper




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux