Re: XDP-hints: Howto support multiple BTF types per packet basis?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Toke Høiland-Jørgensen wrote:
> John Fastabend <john.fastabend@xxxxxxxxx> writes:
> 
> > Jesper Dangaard Brouer wrote:
> >
> > [...]
> >
> > I'll try to respond to both Toke and Jesper here and make it coherent so
> > we don't split this thread yet again.
> >
> > Wish me luck.
> >
> > @Toke "react" -> "not break" hopefully gives you my opinion on this.
> >
> > @Toke "five fields gives 32 different metadata formats" OK let me take
> > five fields,
> >
> >  struct meta {
> >    __u32 f1;
> >    __u32 f2;
> >    __u32 f3;
> >    __u32 f4;
> >    __u32 f5;
> >  }
> >
> > I'm still confused why the meta data would change just because the feature
> > is enabled or disabled. I've written drivers before and I don't want to
> > move around where I write f1 based on some combination of features f2,f3,f4,f5
> > state of enabled/disabled. If features are mutual exclusive I can build a
> > sensible union. If its possible for all fields to enabled then I just lay
> > them out like above.
> 
> The assumption that the layout would be changing as the features were
> enabled came from a discussion I had with Jesper where he pointed out
> that zeroing out the fields that were not active comes with a measurable
> performance impact. So changing the struct layout to only include the
> fields that are currently used is a way to make sure we don't hurt
> performance.
> 
> If I'm understanding you correctly, what you propose instead is that we
> just keep the struct layout the same and only write the data that we
> have, leaving the other fields as uninitialised (so essentially
> garbage), right?

Correct.

> 
> If we do this, the BPF program obviously needs to know which fields are
> valid and which are not. AFAICT you're proposing that this should be
> done out-of-band (i.e., by the system administrator manually ensuring
> BPF program config fits system config)? I think there are a couple of
> problems with this:
> 
> - It requires the system admin to coordinate device config with all of
>   their installed XDP applications. This is error-prone, especially as
>   the number of applications grows (say if different containers have
>   different XDP programs installed on their virtual devices).

A complete "system" will need to be choerent. If I forward into a veth
device the orchestration component needs to ensure program sending
bits there is using the same format the program installed there expects.

If I tailcall/fentry into another program that program the callee and
caller need to agree on the metadata protocol.

I don't see any way around this. Someone has to manage the network.

> 
> - It has synchronisation issues. Say I have an XDP program with optional
>   support for hardware timestamps and a software fallback. It gets
>   installed in software fallback mode; then the admin has to make sure
>   to enable the hardware timestamps before switching the application
>   into the mode where it will read that metadata field (and the opposite
>   order when disabling the hardware mode).

If you want tell the hardware to use an enable bit then check it on
ingress to the XDP program. What I like about this scheme is as a core
kernel or networking person I don't have to care how you do this. Figure
it out with your hardware and driver.

> 
> Also, we need to be able to deal with different metadata layouts on
> different packets in the same program. Consider the XDP program running
> on a veth device inside a container above: if this gets packets
> redirected into it from different NICs with different layouts, it needs
> to be able to figure out which packet came from where.

Again I don't think this is the kernel problem. If you have some setup
where NICs have different metadata then you need a XDP shim to rewrite
metadata. But, as the person who setup this system maybe you should
avoid this for the fast path at least.

> 
> With this in mind I think we have to encode the metadata format into the
> packet metadata itself somehow. This could just be a matter of including
> the BTF ID as part of the struct itself, so that your example could
> essentially do:
> 
>   if (data->meta_btf_id == timestamp_id) {
>     struct timestamp_meta *meta = data->meta_data;
>     // do stuff
>   } else {
>     struct normal_meta *meta = data->meta_data;
>   }

I'm not sure why you call it meta_btf_id there? Why not just
enabledTstampBit. so


  if (mymeta->enabledTstampBit) { ...} else { //fallback }

> 
> 
> and then, to avoid drivers having to define different layouts we could
> essentially have the two metadata structs be:
> 
>  struct normal_meta {
>   u32 rxhash;
>   u32 padding;
>   u8 vlan;
>  };
> 
> and
> 
>  struct timestamp_meta {
>    u32 rxhash;
>    u32 timestamp;
>    u8 vlan;
>  };

So a union on that u32 with padding and timestamp would suffice but
sure if you want to do it at compile time with btf_id ok. I think
runtime would make more sense. Like this,

 struct meta {
   u32 rxhash;
   u32 timestamp;
    u8 vlan;
    u8 flags;
 }

 if (m->flags) { read timestamp}

> 
> This still gets us exponential growth in the number of metadata structs,
> but at least we could create tooling to auto-generate the BTF for the
> variants so the complexity is reduced to just consuming a lot of BTF
> IDs.

auto-generate the BTF? I'm not sure why its needed to be honest. I think
for most simple things you can build a single metadata struct that
will fit. For more complex pipeline updates then we should generate
the BTF with the pipeline using P4 or such IMO.

> 
> Alternatively we could have an explicit bitmap of valid fields, like:
> 
>  struct all_meta {
>    u32 _valid_field_bitmap;
>    u32 rxhash;
>    u32 timestamp;
>    u8 vlan;
>  };

Aha I like this as you can tell from above. Would have just jumped
here but was responding as I was reading. So I think I'll leave above
maybe it is illustrative.

> 
> and if a program reads all_meta->timestamp CO-RE could transparently
> insert a check of the relevant field in the bitmap first. My immediate
> feeling is that this last solution would be nicer: We'd still need to

+1 I think we agree.

> include the packet BTF ID in the packet data to deal with case where
> packets are coming from different interfaces, but we'd avoid having lots
> of different variants with separate BTF IDs. I'm not sure what it would
> take to teach CO-RE to support such a scheme, though...
> 
> WDYT?

I think we agree to the last option (bitmap field) to start with build
that out I think you cover most hardware that is out today. Then
the more complex cases come as step2. And I think its hard to
understate that the last step above mostly works today, you don't need
to go do a bunch of kernel work.

> 
> -Toke
> 






[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux