Re: [PATCH net-next v12 00/15] Introducing P4TC (series 1)

Tom Herbert <tom@xxxxxxxxxx> · Sun, 3 Mar 2024 08:31:11 -0800

On Sat, Mar 2, 2024 at 7:15 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote:
>
> On Fri, 1 Mar 2024 18:20:36 -0800 Tom Herbert wrote:
> > This is configurability versus programmability. The table driven
> > approach as input (configurability) might work fine for generic
> > match-action tables up to the point that tables are expressive enough
> > to satisfy the requirements. But parsing doesn't fall into the table
> > driven paradigm: parsers want to be *programmed*. This is why we
> > removed kParser from this patch set and fell back to eBPF for parsing.
> > But the problem we quickly hit that eBPF is not offloadable to network
> > devices, for example when we compile P4 in an eBPF parser we've lost
> > the declarative representation that parsers in the devices could
> > consume (they're not CPUs running eBPF).
> >
> > I think the key here is what we mean by kernel offload. When we do
> > kernel offload, is it the kernel implementation or the kernel
> > functionality that's being offloaded? If it's the latter then we have
> > a lot more flexibility. What we'd need is a safe and secure way to
> > synchronize with that offload device that precisely supports the
> > kernel functionality we'd like to offload. This can be done if both
> > the kernel bits and programmed offload are derived from the same
> > source (i.e. tag source code with a sha-1). For example, if someone
> > writes a parser in P4, we can compile that into both eBPF and a P4
> > backend using independent tool chains and program download. At
> > runtime, the kernel can safely offload the functionality of the eBPF
> > parser to the device if it matches the hash to that reported by the
> > device
>
> Good points. If I understand you correctly you're saying that parsers
> are more complex than just a basic parsing tree a'la u32.

Yes. Parsing things like TLVs, GRE flag field, or nested protobufs
isn't conducive to u32. We also want the advantages of compiler
optimizations to unroll loops, squash nodes in the parse graph, etc.

> Then we can take this argument further. P4 has grown to encompass a lot
> of functionality of quite complex devices. How do we square that with
> the kernel functionality offload model. If the entire device is modeled,
> including f.e. TSO, an offload would mean that the user has to write
> a TSO implementation which they then load into TC? That seems odd.
>
> IOW I don't quite know how to square in my head the "total
> functionality" with being a TC-based "plugin".

Hi Jakub,

I believe the solution is to replace kernel code with eBPF in cases
where we need programmability. This effectively means that we would
ship eBPF code as part of the kernel. So in the case of TSO, the
kernel would include a standard implementation in eBPF that could be
compiled into the kernel by default. The restricted C source code is
tagged with a hash, so if someone wants to offload TSO they could
compile the source into their target and retain the hash. At runtime
it's a matter of querying the driver to see if the device supports the
TSO program the kernel is running by comparing hash values. Scaling
this, a device could support a catalogue of programs: TSO, LRO,
parser, IPtables, etc., If the kernel can match the hash of its eBPF
code to one reported by the driver then it can assume functionality is
offloadable. This is an elaboration of "device features", but instead
of the device telling us they think they support an adequate GRO
implementation by reporting NETIF_F_GRO, the device would tell the
kernel that they not only support GRO but they provide identical
functionality of the kernel GRO (which IMO is the first requirement of
kernel offload).

Even before considering hardware offload, I think this approach
addresses a more fundamental problem to make the kernel programmable.
Since the code is in eBPF, the kernel can be reprogrammed at runtime
which could be controlled by TC. This allows local customization of
kernel features, but also is the simplest way to "patch" the kernel
with security and bug fixes (nobody is ever excited to do a kernel
rebase in their datacenter!). Flow dissector is a prime candidate for
this, and I am still planning to replace it with an all eBPF program
(https://netdevconf.info/0x15/slides/16/Flow%20dissector_PANDA%20parser.pdf).

Tom