On 03/04, Jamal Hadi Salim wrote: > On Mon, Mar 4, 2024 at 5:23 PM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote: > > > > On 03/04, Jamal Hadi Salim wrote: > > > On Mon, Mar 4, 2024 at 4:23 PM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote: > > > > > > > > On 03/03, Jamal Hadi Salim wrote: > > > > > On Sun, Mar 3, 2024 at 1:11 PM Tom Herbert <tom@xxxxxxxxxx> wrote: > > > > > > > > > > > > On Sun, Mar 3, 2024 at 9:00 AM Jamal Hadi Salim <jhs@xxxxxxxxxxxx> wrote: > > > > > > > > > > > > > > On Sat, Mar 2, 2024 at 10:27 PM Jakub Kicinski <kuba@xxxxxxxxxx> wrote: > > > > > > > > > > > > > > > > On Sat, 2 Mar 2024 09:36:53 -0500 Jamal Hadi Salim wrote: > > > > > > > > > 2) Your point on: "integrate later", or at least "fill in the gaps" > > > > > > > > > This part i am probably going to mumble on. I am going to consider > > > > > > > > > more than just doing ACLs/MAT via flower/u32 for the sake of > > > > > > > > > discussion. > > > > > > > > > True, "fill the gaps" has been our model so far. It requires kernel > > > > > > > > > changes, user space code changes etc justifiably so because most of > > > > > > > > > the time such datapaths are subject to standardization via IETF, IEEE, > > > > > > > > > etc and new extensions come in on a regular basis. And sometimes we > > > > > > > > > do add features that one or two users or a single vendor has need for > > > > > > > > > at the cost of kernel and user/control extension. Given our work > > > > > > > > > process, any features added this way take a long time to make it to > > > > > > > > > the end user. > > > > > > > > > > > > > > > > What I had in mind was more of a DDP model. The device loads it binary > > > > > > > > blob FW in whatever way it does, then it tells the kernel its parser > > > > > > > > graph, and tables. The kernel exposes those tables to user space. > > > > > > > > All dynamic, no need to change the kernel for each new protocol. > > > > > > > > > > > > > > > > But that's different in two ways: > > > > > > > > 1. the device tells kernel the tables, no "dynamic reprogramming" > > > > > > > > 2. you don't need the SW side, the only use of the API is to interact > > > > > > > > with the device > > > > > > > > > > > > > > > > User can still do BPF kfuncs to look up in the tables (like in FIB), > > > > > > > > but call them from cls_bpf. > > > > > > > > > > > > > > > > > > > > > > This is not far off from what is envisioned today in the discussions. > > > > > > > The main issue is who loads the binary? We went from devlink to the > > > > > > > filter doing the loading. DDP is ethtool. We still need to tie a PCI > > > > > > > device/tc block to the "program" so we can do skip_sw and it works. > > > > > > > Meaning a device that is capable of handling multiple programs can > > > > > > > have multiple blobs loaded. A "program" is mapped to a tc filter and > > > > > > > MAT control works the same way as it does today (netlink/tc ndo). > > > > > > > > > > > > > > A program in P4 has a name, ID and people have been suggesting a sha1 > > > > > > > identity (or a signature of some kind should be generated by the > > > > > > > compiler). So the upward propagation could be tied to discovering > > > > > > > these 3 tuples from the driver. Then the control plane targets a > > > > > > > program via those tuples via netlink (as we do currently). > > > > > > > > > > > > > > I do note, using the DDP sample space, currently whatever gets loaded > > > > > > > is "trusted" and really you need to have human knowledge of what the > > > > > > > NIC's parsing + MAT is to send the control. With P4 that is all > > > > > > > visible/programmable by the end user (i am not a proponent of vendors > > > > > > > "shipping" things or calling them for support) - so should be > > > > > > > sufficient to just discover what is in the binary and send the correct > > > > > > > control messages down. > > > > > > > > > > > > > > > I think in P4 terms that may be something more akin to only providing > > > > > > > > the runtime API? I seem to recall they had some distinction... > > > > > > > > > > > > > > There are several solutions out there (ex: TDI, P4runtime) - our API > > > > > > > is netlink and those could be written on top of netlink, there's no > > > > > > > controversy there. > > > > > > > So the starting point is defining the datapath using P4, generating > > > > > > > the binary blob and whatever constraints needed using the vendor > > > > > > > backend and for s/w equivalent generating the eBPF datapath. > > > > > > > > > > > > > > > > At the cost of this sounding controversial, i am going > > > > > > > > > to call things like fdb, fib, etc which have fixed datapaths in the > > > > > > > > > kernel "legacy". These "legacy" datapaths almost all the time have > > > > > > > > > > > > > > > > The cynic in me sometimes thinks that the biggest problem with "legacy" > > > > > > > > protocols is that it's hard to make money on them :) > > > > > > > > > > > > > > That's a big motivation without a doubt, but also there are people > > > > > > > that want to experiment with things. One of the craziest examples we > > > > > > > have is someone who created a P4 program for "in network calculator", > > > > > > > essentially a calculator in the datapath. You send it two operands and > > > > > > > an operator using custom headers, it does the math and responds with a > > > > > > > result in a new header. By itself this program is a toy but it > > > > > > > demonstrates that if one wanted to, they could have something custom > > > > > > > in hardware and/or kernel datapath. > > > > > > > > > > > > Jamal, > > > > > > > > > > > > Given how long P4 has been around it's surprising that the best > > > > > > publicly available code example is "the network calculator" toy. > > > > > > > > > > Come on Tom ;-> That was just an example of something "crazy" to > > > > > demonstrate freedom. I can run that in any of the P4 friendly NICs > > > > > today. You are probably being facetious - There are some serious > > > > > publicly available projects out there, some of which I quote on the > > > > > cover letter (like DASH). > > > > > > > > Shameless plug. I have a more crazy example with bpf: > > > > > > > > https://github.com/fomichev/xdp-btc-miner > > > > > > > > > > Hrm - this looks crazy interesting;-> Tempting. I guess to port this > > > to P4 we'd need the sha256 in h/w (which most of these vendors have > > > already). Is there any other acceleration would you need? Would have > > > been more fun if you invented you own headers too ;-> > > > > Yeah, some way to do sha256(sha256(at_some_fixed_packet_offset + 80 bytes)) > > This part is straight forward. > > > is one thing. And the other is some way to compare that sha256 vs some > > hard-coded (difficulty) number (as a 256-byte uint). > > The compiler may have issues with this comparison - will have to look > (I am pretty sure it's fixable though). > > > > But I have no > > clue how well that maps into declarative p4 language. Most likely > > possible if you're saying that the calculator is possible? > > The calculator basically is written as a set of match-action tables. > You parse your header, construct a key based on the operator field of > the header (eg "+"), invoke an action which takes the operands from > the headers(eg "1" and "2"), the action returns you results(3"). You > stash the result in a new packet and send it back to the source. > > So my thinking is the computation you need would be modelled on an action. > > > I'm assuming that even sha256 can possibly be implemented in p4 without > > any extra support from the vendor? It's just a bunch of xors and > > rotations over a fix-sized input buffer. [..] > True, and I think those would be fast. But if the h/w offers it as an > interface why not. > It's not that you are running out of instruction space - and my memory > is hazy - but iirc, there is sha256 support in the kernel Crypto API - > does it not make sense to kfunc into that? Oh yeah, that's definitely a better path if somebody were do to it "properly". It's still fun, though, to see how far we can push the bpf vm/verifier without using any extra helpers :-D