On Thu, 24 Jun 2021 18:04:48 +0200 Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > Zvi Effron via xdp-hints <xdp-hints@xxxxxxxxxxxxxxx> writes: > > > On Thu, Jun 24, 2021 at 5:23 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > >> > >> Michal Swiatkowski <michal.swiatkowski@xxxxxxxxxxxxxxx> writes: > >> > >> > On Tue, Jun 22, 2021 at 01:53:33PM +0200, Toke Høiland-Jørgensen wrote: > >> >> Michal Swiatkowski <michal.swiatkowski@xxxxxxxxxxxxxxx> writes: > >> >> > >> >> > On Wed, Jun 02, 2021 at 09:18:37AM -0700, Jakub Kicinski wrote: > >> >> >> On Tue, 01 Jun 2021 17:22:51 -0700 John Fastabend wrote: > >> >> >> > > If we do this, the BPF program obviously needs to know which fields are > >> >> >> > > valid and which are not. AFAICT you're proposing that this should be > >> >> >> > > done out-of-band (i.e., by the system administrator manually ensuring > >> >> >> > > BPF program config fits system config)? I think there are a couple of > >> >> >> > > problems with this: > >> >> >> > > > >> >> >> > > - It requires the system admin to coordinate device config with all of > >> >> >> > > their installed XDP applications. This is error-prone, especially as > >> >> >> > > the number of applications grows (say if different containers have > >> >> >> > > different XDP programs installed on their virtual devices). > >> >> >> > > >> >> >> > A complete "system" will need to be choerent. If I forward into a veth > >> >> >> > device the orchestration component needs to ensure program sending > >> >> >> > bits there is using the same format the program installed there expects. > >> >> >> > > >> >> >> > If I tailcall/fentry into another program that program the callee and > >> >> >> > caller need to agree on the metadata protocol. > >> >> >> > > >> >> >> > I don't see any way around this. Someone has to manage the network. > >> >> >> > >> >> >> FWIW I'd like to +1 Toke's concerns. > >> >> >> > >> >> >> In large deployments there won't be a single arbiter. Saying there > >> >> >> is seems to contradict BPF maintainers' previous stand which lead > >> >> >> to addition of bpf_links for XDP. > >> >> >> > >> >> >> In practical terms person rolling out an NTP config change may not > >> >> >> be aware that in some part of the network some BPF program expects > >> >> >> descriptor not to contain time stamps. Besides features may depend > >> >> >> or conflict so the effects of feature changes may not be obvious > >> >> >> across multiple drivers in a heterogeneous environment. > >> >> >> > >> >> >> IMO guarding from obvious mis-configuration provides obvious value. > >> >> > > >> >> > Hi, > >> >> > > >> >> > Thanks for a lot of usefull information about CO-RE. I have read > >> >> > recommended articles, but still don't understand everything, so sorry if > >> >> > my questions are silly. > >> >> > > >> >> > As introduction, I wrote small XDP example using CO-RE (autogenerated > >> >> > vmlinux.h and getting rid of skeleton etc.) based on runqslower > >> >> > implementation. Offset reallocation of hints works great, I built CO-RE > >> >> > application, added new field to hints struct, changed struct layout and > >> >> > without rebuilding application everything still works fine. Is it worth > >> >> > to add XDP sample using CO-RE in kernel or this isn't good place for > >> >> > this kind of sample? > >> >> > > >> >> > First question not stricte related to hints. How to get rid of #define > >> >> > and macro when I am using generated vmlinux.h? For example I wanted to > >> >> > use htons macro and ethtype definition. They are located in headers that > >> >> > also contains few struct definition. Because of that I have redefinition > >> >> > error when I am trying to include them (redefinition in vmlinux.h and > >> >> > this included file). What can I do with this besides coping definitions > >> >> > to bpf code? > >> >> > >> >> One way is to only include the structs you actually need from vmlinux.h. > >> >> You can even prune struct members, since CO-RE works just fine with > >> >> partial struct definitions as long as the member names match. > >> >> > >> >> Jesper has an example on how to handle this here: > >> >> https://github.com/netoptimizer/bpf-examples/blob/ktrace01-CO-RE.public/headers/vmlinux_local.h Above links to my experimental "learning-by-doing" branch. I've created a PR to merge this officially here: https://github.com/xdp-project/bpf-examples/pull/24/ > >> > > >> > I see, thanks, I will take a look at other examples. > >> > > >> >> > I defined hints struct in driver code, is it right place for that? All > >> >> > vendors will define their own hints struct or the idea is to have one > >> >> > big hints struct with flags informing about availability of each fields? > >> >> > > >> >> > For me defining it in driver code was easier because I can have used > >> >> > module btf to generate vmlinux.h with hints struct inside. However this > >> >> > break portability if other vendors will have different struct name etc, > >> >> > am I right? > >> >> > >> >> I would expect the easiest is for drivers to just define their own > >> >> structs and maybe have some infrastructure in the core to let userspace > >> >> discover the right BTF IDs to use for a particular netdev. However, as > >> >> you say it's not going to work if every driver just invents their own > >> >> field names, so we'll need to coordinate somehow. We could do this by > >> >> convention, though, it'll need manual intervention to make sure the > >> >> semantics of identically-named fields match anyway. > >> >> > >> >> Cf the earlier discussion with how many BTF IDs each driver might > >> >> define, I think we *also* need a way to have flags that specify which > >> >> fields of a given BTF ID are currently used; and having some common > >> >> infrastructure for that would be good... > >> >> > >> > > >> > Sounds good. > >> > > >> > Sorry, but I feel that I don't fully understand the idea. Correct me if > >> > I am wrong: > >> > > >> > In building CO-RE application step we can defined big struct with > >> > all possible fields or even empty struct (?) and use > >> > bpf_core_field_exists. > >> > > >> > bpf_core_field_exists will be resolve before loading program by libbpf > >> > code. In normal case libbpf will look for btf with hints name in vmlinux > >> > of running kernel and do offset rewrite and exsistence check. But as the > >> > same hints struct will be define in multiple modules we want to add more > >> > logic to libbpf to discover correct BTF ID based on netdev on which program > >> > will be loaded? > >> > >> I would expect that the program would decide ahead-of-time which BTF IDs > >> it supports, by something like including the relevant structs from > >> vmlinux.h. And then we need the BTF ID encoded into the packet metadata > >> as well, so that it is possible to check at run-time which driver the > >> packet came from (since a packet can be redirected, so you may end up > >> having to deal with multiple formats in the same XDP program). > >> > >> Which would allow you to write code like: > >> > >> if (ctx->has_driver_meta) { > >> /* this should be at a well-known position, like first (or last) in meta area */ > >> __u32 *meta_btf_id = ctx->data_meta; > >> > >> if (*meta_btf_id == BTF_ID_MLX5) { > >> struct meta_mlx5 *meta = ctx->data_meta; > >> /* do something with meta */ > >> } else if (meta_btf_id == BTF_ID_I40E) { > >> struct meta_i40e *meta = ctx->data_meta; > >> /* do something with meta */ > >> } /* etc */ > >> } > >> > >> and libbpf could do relocations based on the different meta structs, > >> even removing the code for the ones that don't exist on the running > >> kernel. > >> > >> -Toke > >> > > > > How does putting the BTF ID and the driver metadata into the XDP metadata > > section interact with programs that are already using the metadata section > > for other purposes. For example, programs that use the XDP metadata to pass > > information through BPF tail calls? > > > > Would this break existing programs that aren't aware of the new driver > > metadata? Do we need to make driver metadata opt-in at XDP program > > load? > > Well, XDP applications would be free to just ignore the driver-provided > metadata and overwrite it with its own data? And I guess any application > that doesn't know about it will just implicitly do that? :) Remember to wrap your head around: That metadata area "grows" via minus offset as ctx->data_meta points to area before ctx->data. See[1] where bpf_xdp_adjust_meta() helper does a minus adjustment. [1] https://github.com/torvalds/linux/blob/v5.13-rc7/samples/bpf/xdp2skb_meta_kern.c#L41 Thus, AFAIK if the driver already added some metadata before your XDP-prog, then this call[1] will just move ctx->data_meta some-more to make room for *your* metadata (and driver metadata will be "after"). When using this metadata area, e.g.[2] then it will point to the metadata you added. [2] https://github.com/torvalds/linux/blob/v5.13-rc7/samples/bpf/xdp2skb_meta_kern.c#L78 Notice, this is also the reason, we (Bjørn, Magnus + I) suggested that the btf_id (for AF_XDP use-case) should be placed as the "last" element in the metadata struct, as it will be located at (ctx->data - 4 bytes). -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer