Re: XDP-hints: Howto support multiple BTF types per packet basis?

John Fastabend <john.fastabend@xxxxxxxxx> · Thu, 27 May 2021 22:48:50 -0700

Andrii Nakryiko wrote:
> On Wed, May 26, 2021 at 5:44 PM John Fastabend <john.fastabend@xxxxxxxxx> wrote:
> >
> > [...]

[...]

> > > > Next up write some XDP program to do something with it,
> > > >
> > > >  void myxdp_prog(struct xdp_md *ctx) {
> > > >         struct mynic_metadata m = (struct mynic_metadata *)ctx->data_meta;
> > > >
> > > >         // now I can get data using normal CO-RE
> > > >         // I usually have this _(&) to put CO-RE attributes in I
> > > >         // believe that is standard? Or use the other macros
> > > >         __u64 pkt_type = _(&m->pkt_type)
> > >
> > > add __attribute__((preserve_access_index)) to the struct
> > > mynic_metadata above (when compiling your BPF program) and you don't
> > > need _() ugliness:
> >
> > +1. Although sometimes I like the ugliness so I can keep track
> > of whats in CO-RE and not.
> 
> Oh, I'm just against using underscore as an identifier, I'd use
> something a bit more explicit.

Sure.

> 
> >
> > >
> > > __u64 pkt_type = m->pkt_type; /* it's CO-RE relocatable already */
> > >
> > > we have preserve_access_index as a code block (some selftests do this)
> > > for cases when you can't annotate types
> > >
> > > >
> > > >         // we can even walk into structs if we have probe read
> > > >         // around.
> > > >         struct mynic_rx_descriptor *rxdesc = _(&m->ptr_to_rx)
> > > >
> > > >         // now do whatever I like with above metadata
> > > >  }
> > > >
> > > > Run above program through normal CO-RE pass and as long as it has
> > > > access to the BTF from above it will work. I have some logic
> > > > sitting around to stitch two BTF blocks together but we have
> > > > that now done properly for linking.
> > >
> > > "stitching BTF blocks together" sort of jumped out of nowhere, what is
> > > this needed for? And not sure what "BTF block" means exactly, it's a
> > > new terminology.
> >
> > I didn't know what the correct terminology here would be.
> 
> I just wasn't sure if "BTF block" is a single BTF type or it's a
> collection of types built on top of vmlinux BTF (what we call split
> BTF). Seems like it's the latter.

Yep, collection of types. Also we have all the BTF writers there so
its easy to create them from whatever backend is creating the
hardware configuration/ucode.

> 
> >
> > What I meant is I think what you have here,
> >
> > "
> >  BTW, not that I encourage such abuse, but for the experiment's sake,
> >  you can (ab)use module BTFs mechanism today to allow dynamically
> >  adding/removing split BTFs built on top of kernel (vmlinux) BTF
> > "
> >
> > So if vendor/driver writer has a BTF file for whatever the current
> > hardware is doing we can use the split BTF build mechanism to
> > include it. This can be used to get Jespers dynamic reprogram
> > hardware example. We just need someway to get the BTF of the
> > current running hardware. What I'm suggesting to get going we
> > can just take that out of band, libbpf/kernel don't have
> > to care where it comes from as long as libbpf can consume the
> > split BTFs before doing CO-RE.
> >
> > With this model I can have a single XDP program and it will
> > run on multiple hardware or the same hardware across updates
> > when I can use the normal CO-RE macros to access the metadata.
> > When I update my hardware I just need to get ahold of the
> > BTF when I do that update and my programs will continue to
> > work.
> >
> > Once we show the value of above we can talk about a driver
> > mechanism to expose the BTF over some interface, maybe in
> > /sys/fs. But that would still look like a split BTF from libbpf
> > side. The advantage is it should work today.
> 
> Right, except I don't think we have libbpf APIs to specify this, but
> that's solvable.

Sure, I believe I just pulled some internals out to get it
working. It shouldn't be too difficult to do it correctly.

> 
> >
> > I called the process of taking two BTF files, vmlinux BTF and
> > user provided NIC metadata BTF, and using those for CO-RE
> > logic "stitching BTF blocks together".
> >
> > >
> > > >
> > > > probe_read from XDP should be added regardless of above. I've
> > > > found it super handy in skmsg programs to dig out kernel info
> > > > inline. With probe_read we can also start to walk net_device
> > > > struct for more detailed info as needed. Or into sock structs
> > >
> > > yes, libbpf provides BPF_CORE_READ() macro that allows to walk across
> > > struct referenced by pointers, e.g.,:
> > >
> > > int my_data = BPF_CORE_READ(m, ptr_to_rx, rx_field);
> > >
> > > is logical equivalent of
> > >
> > > int my_data = m->ptr_to_rx->rx_field;
> >
> > The only complication here is ptr_to_rx is outside XDP data
> > so we need XDP program to support probe_read(). So depending
> > on current capabilities a BPF program might be limited to
> > just its own data block or with higher caps able to use
> > more of the features.
> >
> 
> Right.

Likely start with just metadata and worry about probe later. Anyways
I think it would be useful to have probe to read netdev, sock and
task structs that has nothing to do with this thread.

[...]

> > > union and independent set of BTFs are two different things, I'll let
> > > you guys figure out which one you need, but I replied how it could
> > > look like in CO-RE world
> >
> > I think a union is sufficient and more aligned with how the
> > hardware would actually work.
> 
> Sure. And I think those are two orthogonal concerns. You can start
> with a single struct mynic_metadata with union inside it, and later
> add the ability to swap mynic_metadata with another
> mynic_metadata___v2 that will have a similar union but with a
> different layout.

Right and then you just have normal upgrade/downgrade problems with
any struct.

Seems like a workable path to me. But, need to circle back to the
what we want to do with it part that Jesper replied to.

.John