Re: XDP-hints: Howto support multiple BTF types per packet basis?

Toke Høiland-Jørgensen <toke@xxxxxxxxxx> · Thu, 24 Jun 2021 18:04:48 +0200

Zvi Effron via xdp-hints <xdp-hints@xxxxxxxxxxxxxxx> writes:

> On Thu, Jun 24, 2021 at 5:23 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote:
>>
>> Michal Swiatkowski <michal.swiatkowski@xxxxxxxxxxxxxxx> writes:
>>
>> > On Tue, Jun 22, 2021 at 01:53:33PM +0200, Toke Høiland-Jørgensen wrote:
>> >> Michal Swiatkowski <michal.swiatkowski@xxxxxxxxxxxxxxx> writes:
>> >>
>> >> > On Wed, Jun 02, 2021 at 09:18:37AM -0700, Jakub Kicinski wrote:
>> >> >> On Tue, 01 Jun 2021 17:22:51 -0700 John Fastabend wrote:
>> >> >> > > If we do this, the BPF program obviously needs to know which fields are
>> >> >> > > valid and which are not. AFAICT you're proposing that this should be
>> >> >> > > done out-of-band (i.e., by the system administrator manually ensuring
>> >> >> > > BPF program config fits system config)? I think there are a couple of
>> >> >> > > problems with this:
>> >> >> > >
>> >> >> > > - It requires the system admin to coordinate device config with all of
>> >> >> > >   their installed XDP applications. This is error-prone, especially as
>> >> >> > >   the number of applications grows (say if different containers have
>> >> >> > >   different XDP programs installed on their virtual devices).
>> >> >> >
>> >> >> > A complete "system" will need to be choerent. If I forward into a veth
>> >> >> > device the orchestration component needs to ensure program sending
>> >> >> > bits there is using the same format the program installed there expects.
>> >> >> >
>> >> >> > If I tailcall/fentry into another program that program the callee and
>> >> >> > caller need to agree on the metadata protocol.
>> >> >> >
>> >> >> > I don't see any way around this. Someone has to manage the network.
>> >> >>
>> >> >> FWIW I'd like to +1 Toke's concerns.
>> >> >>
>> >> >> In large deployments there won't be a single arbiter. Saying there
>> >> >> is seems to contradict BPF maintainers' previous stand which lead
>> >> >> to addition of bpf_links for XDP.
>> >> >>
>> >> >> In practical terms person rolling out an NTP config change may not
>> >> >> be aware that in some part of the network some BPF program expects
>> >> >> descriptor not to contain time stamps. Besides features may depend
>> >> >> or conflict so the effects of feature changes may not be obvious
>> >> >> across multiple drivers in a heterogeneous environment.
>> >> >>
>> >> >> IMO guarding from obvious mis-configuration provides obvious value.
>> >> >
>> >> > Hi,
>> >> >
>> >> > Thanks for a lot of usefull information about CO-RE. I have read
>> >> > recommended articles, but still don't understand everything, so sorry if
>> >> > my questions are silly.
>> >> >
>> >> > As introduction, I wrote small XDP example using CO-RE (autogenerated
>> >> > vmlinux.h and getting rid of skeleton etc.) based on runqslower
>> >> > implementation. Offset reallocation of hints works great, I built CO-RE
>> >> > application, added new field to hints struct, changed struct layout and
>> >> > without rebuilding application everything still works fine. Is it worth
>> >> > to add XDP sample using CO-RE in kernel or this isn't good place for
>> >> > this kind of sample?
>> >> >
>> >> > First question not stricte related to hints. How to get rid of #define
>> >> > and macro when I am using generated vmlinux.h? For example I wanted to
>> >> > use htons macro and ethtype definition. They are located in headers that
>> >> > also contains few struct definition. Because of that I have redefinition
>> >> > error when I am trying to include them (redefinition in vmlinux.h and
>> >> > this included file). What can I do with this besides coping definitions
>> >> > to bpf code?
>> >>
>> >> One way is to only include the structs you actually need from vmlinux.h.
>> >> You can even prune struct members, since CO-RE works just fine with
>> >> partial struct definitions as long as the member names match.
>> >>
>> >> Jesper has an example on how to handle this here:
>> >> https://github.com/netoptimizer/bpf-examples/blob/ktrace01-CO-RE.public/headers/vmlinux_local.h
>> >>
>> >
>> > I see, thanks, I will take a look at other examples.
>> >
>> >> > I defined hints struct in driver code, is it right place for that? All
>> >> > vendors will define their own hints struct or the idea is to have one
>> >> > big hints struct with flags informing about availability of each fields?
>> >> >
>> >> > For me defining it in driver code was easier because I can have used
>> >> > module btf to generate vmlinux.h with hints struct inside. However this
>> >> > break portability if other vendors will have different struct name etc,
>> >> > am I right?
>> >>
>> >> I would expect the easiest is for drivers to just define their own
>> >> structs and maybe have some infrastructure in the core to let userspace
>> >> discover the right BTF IDs to use for a particular netdev. However, as
>> >> you say it's not going to work if every driver just invents their own
>> >> field names, so we'll need to coordinate somehow. We could do this by
>> >> convention, though, it'll need manual intervention to make sure the
>> >> semantics of identically-named fields match anyway.
>> >>
>> >> Cf the earlier discussion with how many BTF IDs each driver might
>> >> define, I think we *also* need a way to have flags that specify which
>> >> fields of a given BTF ID are currently used; and having some common
>> >> infrastructure for that would be good...
>> >>
>> >
>> > Sounds good.
>> >
>> > Sorry, but I feel that I don't fully understand the idea. Correct me if
>> > I am wrong:
>> >
>> > In building CO-RE application step we can defined big struct with
>> > all possible fields or even empty struct (?) and use
>> > bpf_core_field_exists.
>> >
>> > bpf_core_field_exists will be resolve before loading program by libbpf
>> > code. In normal case libbpf will look for btf with hints name in vmlinux
>> > of running kernel and do offset rewrite and exsistence check. But as the
>> > same hints struct will be define in multiple modules we want to add more
>> > logic to libbpf to discover correct BTF ID based on netdev on which program
>> > will be loaded?
>>
>> I would expect that the program would decide ahead-of-time which BTF IDs
>> it supports, by something like including the relevant structs from
>> vmlinux.h. And then we need the BTF ID encoded into the packet metadata
>> as well, so that it is possible to check at run-time which driver the
>> packet came from (since a packet can be redirected, so you may end up
>> having to deal with multiple formats in the same XDP program).
>>
>> Which would allow you to write code like:
>>
>> if (ctx->has_driver_meta) {
>>   /* this should be at a well-known position, like first (or last) in meta area */
>>   __u32 *meta_btf_id = ctx->data_meta;
>>
>>   if (*meta_btf_id == BTF_ID_MLX5) {
>>     struct meta_mlx5 *meta = ctx->data_meta;
>>     /* do something with meta */
>>   } else if (meta_btf_id == BTF_ID_I40E) {
>>     struct meta_i40e *meta = ctx->data_meta;
>>     /* do something with meta */
>>   } /* etc */
>> }
>>
>> and libbpf could do relocations based on the different meta structs,
>> even removing the code for the ones that don't exist on the running
>> kernel.
>>
>> -Toke
>>
>
> How does putting the BTF ID and the driver metadata into the XDP metadata
> section interact with programs that are already using the metadata section
> for other purposes. For example, programs that use the XDP metadata to pass
> information through BPF tail calls?
>
> Would this break existing programs that aren't aware of the new driver
> metadata? Do we need to make driver metadata opt-in at XDP program
> load?

Well, XDP applications would be free to just ignore the driver-provided
metadata and overwrite it with its own data? And I guess any application
that doesn't know about it will just implicitly do that? :)

-Toke