On Sat, Mar 2, 2024 at 7:15 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote: > > On Fri, 1 Mar 2024 18:20:36 -0800 Tom Herbert wrote: > > This is configurability versus programmability. The table driven > > approach as input (configurability) might work fine for generic > > match-action tables up to the point that tables are expressive enough > > to satisfy the requirements. But parsing doesn't fall into the table > > driven paradigm: parsers want to be *programmed*. This is why we > > removed kParser from this patch set and fell back to eBPF for parsing. > > But the problem we quickly hit that eBPF is not offloadable to network > > devices, for example when we compile P4 in an eBPF parser we've lost > > the declarative representation that parsers in the devices could > > consume (they're not CPUs running eBPF). > > > > I think the key here is what we mean by kernel offload. When we do > > kernel offload, is it the kernel implementation or the kernel > > functionality that's being offloaded? If it's the latter then we have > > a lot more flexibility. What we'd need is a safe and secure way to > > synchronize with that offload device that precisely supports the > > kernel functionality we'd like to offload. This can be done if both > > the kernel bits and programmed offload are derived from the same > > source (i.e. tag source code with a sha-1). For example, if someone > > writes a parser in P4, we can compile that into both eBPF and a P4 > > backend using independent tool chains and program download. At > > runtime, the kernel can safely offload the functionality of the eBPF > > parser to the device if it matches the hash to that reported by the > > device > > Good points. If I understand you correctly you're saying that parsers > are more complex than just a basic parsing tree a'la u32. Yes. Parsing things like TLVs, GRE flag field, or nested protobufs isn't conducive to u32. We also want the advantages of compiler optimizations to unroll loops, squash nodes in the parse graph, etc. > Then we can take this argument further. P4 has grown to encompass a lot > of functionality of quite complex devices. How do we square that with > the kernel functionality offload model. If the entire device is modeled, > including f.e. TSO, an offload would mean that the user has to write > a TSO implementation which they then load into TC? That seems odd. > > IOW I don't quite know how to square in my head the "total > functionality" with being a TC-based "plugin". Hi Jakub, I believe the solution is to replace kernel code with eBPF in cases where we need programmability. This effectively means that we would ship eBPF code as part of the kernel. So in the case of TSO, the kernel would include a standard implementation in eBPF that could be compiled into the kernel by default. The restricted C source code is tagged with a hash, so if someone wants to offload TSO they could compile the source into their target and retain the hash. At runtime it's a matter of querying the driver to see if the device supports the TSO program the kernel is running by comparing hash values. Scaling this, a device could support a catalogue of programs: TSO, LRO, parser, IPtables, etc., If the kernel can match the hash of its eBPF code to one reported by the driver then it can assume functionality is offloadable. This is an elaboration of "device features", but instead of the device telling us they think they support an adequate GRO implementation by reporting NETIF_F_GRO, the device would tell the kernel that they not only support GRO but they provide identical functionality of the kernel GRO (which IMO is the first requirement of kernel offload). Even before considering hardware offload, I think this approach addresses a more fundamental problem to make the kernel programmable. Since the code is in eBPF, the kernel can be reprogrammed at runtime which could be controlled by TC. This allows local customization of kernel features, but also is the simplest way to "patch" the kernel with security and bug fixes (nobody is ever excited to do a kernel rebase in their datacenter!). Flow dissector is a prime candidate for this, and I am still planning to replace it with an all eBPF program (https://netdevconf.info/0x15/slides/16/Flow%20dissector_PANDA%20parser.pdf). Tom