Hi all, I was closely following this discussion for some time now. Seems we reached the point where it's getting interesting for me. On Fri, 2022-10-28 at 18:14 -0700, Jakub Kicinski wrote: > On Fri, 28 Oct 2022 16:16:17 -0700 John Fastabend wrote: > > > > And it's actually harder to abstract away inter HW generation > > > > differences if the user space code has to handle all of it. > > > > I don't see how its any harder in practice though? > > You need to find out what HW/FW/config you're running, right? > And all you have is a pointer to a blob of unknown type. > > Take timestamps for example, some NICs support adjusting the PHC > or doing SW corrections (with different versions of hw/fw/server > platforms being capable of both/one/neither). > > Sure you can extract all this info with tracing and careful > inspection via uAPI. But I don't think that's _easier_. > And the vendors can't run the results thru their validation > (for whatever that's worth). > > > > I've had the same concern: > > > > > > Until we have some userspace library that abstracts all these details, > > > it's not really convenient to use. IIUC, with a kptr, I'd get a blob > > > of data and I need to go through the code and see what particular type > > > it represents for my particular device and how the data I need is > > > represented there. There are also these "if this is device v1 -> use > > > v1 descriptor format; if it's a v2->use this another struct; etc" > > > complexities that we'll be pushing onto the users. With kfuncs, we put > > > this burden on the driver developers, but I agree that the drawback > > > here is that we actually have to wait for the implementations to catch > > > up. > > > > I agree with everything there, you will get a blob of data and then > > will need to know what field you want to read using BTF. But, we > > already do this for BPF programs all over the place so its not a big > > lift for us. All other BPF tracing/observability requires the same > > logic. I think users of BPF in general perhaps XDP/tc are the only > > place left to write BPF programs without thinking about BTF and > > kernel data structures. > > > > But, with proposed kptr the complexity lives in userspace and can be > > fixed, added, updated without having to bother with kernel updates, etc. > > From my point of view of supporting Cilium its a win and much preferred > > to having to deal with driver owners on all cloud vendors, distributions, > > and so on. > > > > If vendor updates firmware with new fields I get those immediately. > > Conversely it's a valid concern that those who *do* actually update > their kernel regularly will have more things to worry about. > > > > Jakub mentions FW and I haven't even thought about that; so yeah, bpf > > > programs might have to take a lot of other state into consideration > > > when parsing the descriptors; all those details do seem like they > > > belong to the driver code. > > > > I would prefer to avoid being stuck on requiring driver writers to > > be involved. With just a kptr I can support the device and any > > firwmare versions without requiring help. > > 1) where are you getting all those HW / FW specs :S > 2) maybe *you* can but you're not exactly not an ex-driver developer :S > > > > Feel free to send it early with just a handful of drivers implemented; > > > I'm more interested about bpf/af_xdp/user api story; if we have some > > > nice sample/test case that shows how the metadata can be used, that > > > might push us closer to the agreement on the best way to proceed. > > > > I'll try to do a intel and mlx implementation to get a cross section. > > I have a good collection of nics here so should be able to show a > > couple firmware versions. It could be fine I think to have the raw > > kptr access and then also kfuncs for some things perhaps. > > > > > > I'd prefer if we left the door open for new vendors. Punting descriptor > > > > parsing to user space will indeed result in what you just said - major > > > > vendors are supported and that's it. > > > > I'm not sure about why it would make it harder for new vendors? I think > > the opposite, > > TBH I'm only replying to the email because of the above part :) > I thought this would be self evident, but I guess our perspectives > are different. > > Perhaps you look at it from the perspective of SW running on someone > else's cloud, an being able to move to another cloud, without having > to worry if feature X is available in xdp or just skb. > > I look at it from the perspective of maintaining a cloud, with people > writing random XDP applications. If I swap a NIC from an incumbent to a > (superior) startup, and cloud users are messing with raw descriptor - > I'd need to go find every XDP program out there and make sure it > understands the new descriptors. Here is another perspective: As AF_XDP application developer I don't wan't to deal with the underlying hardware in detail. I like to request a feature from the OS (in this case rx/tx timestamping). If the feature is available I will simply use it, if not I might have to work around it - maybe by falling back to SW timestamping. All parts of my application (BPF program included) should not be optimized/adjusted for all the different HW variants out there. My application might be run on bare metal/cloud/virtual systems. I do not want to care about this scenarios differently. I followed the idea of having a library for parsing the driver specific meta information. That would mean that this library has to keep in sync with the kernel, right? It doesn't help if a newer kernel provides XDP hints support for more devices/drivers but the library is not updated. That might be relevant for all the device update strategies out there. In addition - and maybe even contrary - we care about zero copy (ZC) support. Our current use case has to deal with a lot of small packets, so we hope to benefit from that. If XDP hints support requires a copy of the meta data - maybe to drive a HW independent interface - that might be a bottle neck for us. > > There is a BPF foundation or whatnot now - what about starting a > certification program for cloud providers and making it clear what > features must be supported to be compatible with XDP 1.0, XDP 2.0 etc? > > > it would be easier because I don't need vendor support at all. > > Can you support the enfabrica NIC on day 1? :) To an extent, its just > shifting the responsibility from the HW vendor to the middleware vendor. > > > Thinking it over seems there could be room for both. > > Are you thinking more or less Stan's proposal but with one of > the callbacks being "give me the raw thing"? Probably as a ro dynptr? > Possible, but I don't think we need to hold off Stan's work.