> None of above requires P4TC. For different architectures you > build optimal backend compilers. You have a Xilenx backend, > an Intel backend, and a Linux CPU based backend. I see no > reason to constrain the software case to map to a pipeline > model for example. Software running on a CPU has very different > characteristics from something running on a TOR, or FPGA. > Trying to push all these into one backend "model" will result > in suboptimal result for every target. At the end of the > day my .02$, P4 is a DSL it needs a target dependent compiler > in front of it. I want to optimize my software pipeline the > compiler should compress tables as much as possible and > search for a O(1) lookup even if getting that key is somewhat > expensive. Conversely a TCAM changes the game. An FPGA is > going to be flexible and make lots of tradeoffs here of which > I'm not an expert. Also by avoiding loading the DSL into the kernel > you leave room for others to build new/better/worse DSLs as they > please. > I think the general ask here is to define an Intermediate Representation that describes a programmed data path where it's a combination of declarative and imperative elements (parsers and table descriptions are better in declarative representation, functional logic seems more imperative). We also want references to accelerators with dynamic runtime binding to hardware (there are some interesting tricks we can do in the loader for a CPU target-- will talk about at Netdev). With a good IR we can decouple the frontend from the backend target which enables mixing and matching programming languages with arbitrary HW or SW targets. So a good IR potentially enables a lot of flexibility and freedom on both sides of the equation. An IR also facilitates reasonable kernel offload via signing images with a hash of the IR. So for instance, a frontend compiler could compile a P4 program into the IR. That code could then be compiled into a SW target, say eBPF, and maybe P4 hardware. Each image has the hash of the IR. At runtime, the eBPF code could be loaded into the kernel. The hardware image can be loaded into the device using a side band mechanism. To offload, we would query the device-- if the hash reported by the device matches the hash in the eBPF then we know that the offload is viable. No jits, no pushing firmware bits through the kernel, no need for device capabilities flags, and avoids the pitfalls of TC flower. There is one challenge here in how to deal with offloads that are already integrated into the kernel. I think GRO is a great example. GRO has been especially elusive as an offload since it requires a device to autonomously parse packets on input. We really want a GRO offload that parses the same exact protocols the kernel does (including encapsulations), but also implements the exact same logic in timers and pushing reassembled segments. So this needs to be programmable. The problem with the technique I described is that GRO is integrated into the kernel so we have no basis for a hash. I think the answer here is to start replacing fixed kernel C code with eBPF even in the critical path (we already talked about replacing flow dissector with eBPF). Anyway, we have been working on this. There's Common Parser Representation in json (formerly known CPL that we talked about at Netdev). For execution logic, LLVM IR seems fine (btrw, MLIR is really useful by the way!). We're just starting to look at tables (probably also json). If there's interest I could share more... Tom