Re: [RFC bpf-next 0/5] xdp: hints via kfuncs

John Fastabend <john.fastabend@xxxxxxxxx> · Fri, 28 Oct 2022 16:16:17 -0700

Stanislav Fomichev wrote:
> On Fri, Oct 28, 2022 at 11:05 AM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
> >
> > On Fri, 28 Oct 2022 08:58:18 -0700 John Fastabend wrote:
> > > A bit of extra commentary. By exposing the raw kptr to the rx
> > > descriptor we don't need driver writers to do anything.
> > > And can easily support all the drivers out the gate with simple
> > > one or two line changes. This pushes the interesting parts
> > > into userspace and then BPF writers get to do the work without
> > > bother driver folks and also if its not done today it doesn't
> > > matter because user space can come along and make it work
> > > later. So no scattered kernel dependencies which I really
> > > would like to avoid here. Its actually very painful to have
> > > to support clusters with N kernels and M devices if they
> > > have different features. Doable but annoying and much nicer
> > > if we just say 6.2 has support for kptr rx descriptor reading
> > > and all XDP drivers support it. So timestamp, rxhash work
> > > across the board.
> >
> > IMHO that's a bit of wishful thinking. Driver support is just a small
> > piece, you'll have different HW and FW versions, feature conflicts etc.
> > In the end kernel version is just one variable and there are many others
> > you'll already have to track.

Agree.

> >
> > And it's actually harder to abstract away inter HW generation
> > differences if the user space code has to handle all of it.

I don't see how its any harder in practice though?

> 
> I've had the same concern:
> 
> Until we have some userspace library that abstracts all these details,
> it's not really convenient to use. IIUC, with a kptr, I'd get a blob
> of data and I need to go through the code and see what particular type
> it represents for my particular device and how the data I need is
> represented there. There are also these "if this is device v1 -> use
> v1 descriptor format; if it's a v2->use this another struct; etc"
> complexities that we'll be pushing onto the users. With kfuncs, we put
> this burden on the driver developers, but I agree that the drawback
> here is that we actually have to wait for the implementations to catch
> up.

I agree with everything there, you will get a blob of data and then
will need to know what field you want to read using BTF. But, we
already do this for BPF programs all over the place so its not a big
lift for us. All other BPF tracing/observability requires the same
logic. I think users of BPF in general perhaps XDP/tc are the only
place left to write BPF programs without thinking about BTF and
kernel data structures.

But, with proposed kptr the complexity lives in userspace and can be
fixed, added, updated without having to bother with kernel updates, etc.
>From my point of view of supporting Cilium its a win and much preferred
to having to deal with driver owners on all cloud vendors, distributions,
and so on.

If vendor updates firmware with new fields I get those immediately.

> 
> Jakub mentions FW and I haven't even thought about that; so yeah, bpf
> programs might have to take a lot of other state into consideration
> when parsing the descriptors; all those details do seem like they
> belong to the driver code.

I would prefer to avoid being stuck on requiring driver writers to
be involved. With just a kptr I can support the device and any
firwmare versions without requiring help.

> 
> Feel free to send it early with just a handful of drivers implemented;
> I'm more interested about bpf/af_xdp/user api story; if we have some
> nice sample/test case that shows how the metadata can be used, that
> might push us closer to the agreement on the best way to proceed.

I'll try to do a intel and mlx implementation to get a cross section.
I have a good collection of nics here so should be able to show a
couple firmware versions. It could be fine I think to have the raw
kptr access and then also kfuncs for some things perhaps.

> 
> 
> 
> > > To find the offset of fields (rxhash, timestamp) you can use
> > > standard BTF relocations we have all this machinery built up
> > > already for all the other structs we read, net_devices, task
> > > structs, inodes, ... so its not a big hurdle at all IMO. We
> > > can add userspace libs if folks really care, but its just a read so
> > > I'm not even sure that is helpful.
> > >
> > > I think its nicer than having kfuncs that need to be written
> > > everywhere. My $.02 although I'll poke around with below
> > > some as well. Feel free to just hang tight until I have some
> > > code at the moment I have intel, mellanox drivers that I
> > > would want to support.
> >
> > I'd prefer if we left the door open for new vendors. Punting descriptor
> > parsing to user space will indeed result in what you just said - major
> > vendors are supported and that's it.

I'm not sure about why it would make it harder for new vendors? I think
the opposite, it would be easier because I don't need vendor support
at all. Thinking it over seems there could be room for both.

Thanks!