Tue, Nov 21, 2023 at 04:21:44PM CET, jhs@xxxxxxxxxxxx wrote: >On Tue, Nov 21, 2023 at 9:19 AM Jiri Pirko <jiri@xxxxxxxxxxx> wrote: >> >> Tue, Nov 21, 2023 at 02:47:40PM CET, jhs@xxxxxxxxxxxx wrote: >> >On Tue, Nov 21, 2023 at 8:06 AM Jiri Pirko <jiri@xxxxxxxxxxx> wrote: >> >> >> >> Mon, Nov 20, 2023 at 11:56:50PM CET, jhs@xxxxxxxxxxxx wrote: >> >> >On Mon, Nov 20, 2023 at 4:49 PM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: >> >> >> >> >> >> On 11/20/23 8:56 PM, Jamal Hadi Salim wrote: >> >> >> > On Mon, Nov 20, 2023 at 1:10 PM Jiri Pirko <jiri@xxxxxxxxxxx> wrote: >> >> >> >> Mon, Nov 20, 2023 at 03:23:59PM CET, jhs@xxxxxxxxxxxx wrote: >> >> >> >> [...] >> >> >> >> > >> >> >> tc BPF and XDP already have widely used infrastructure and can be developed >> >> >> against libbpf or other user space libraries for a user space control plane. >> >> >> With 'control plane' you refer here to the tc / netlink shim you've built, >> >> >> but looking at the tc command line examples, this doesn't really provide a >> >> >> good user experience (you call it p4 but people load bpf obj files). If the >> >> >> expectation is that an operator should run tc commands, then neither it's >> >> >> a nice experience for p4 nor for BPF folks. From a BPF PoV, we moved over >> >> >> to bpf_mprog and plan to also extend this for XDP to have a common look and >> >> >> feel wrt networking for developers. Why can't this be reused? >> >> > >> >> >The filter loading which loads the program is considered pipeline >> >> >instantiation - consider it as "provisioning" more than "control" >> >> >which runs at runtime. "control" is purely netlink based. The iproute2 >> >> >code we use links libbpf for example for the filter. If we can achieve >> >> >the same with bpf_mprog then sure - we just dont want to loose >> >> >functionality though. off top of my head, some sample space: >> >> >- we could have multiple pipelines with different priorities (which tc >> >> >provides to us) - and each pipeline may have its own logic with many >> >> >tables etc (and the choice to iterate the next one is essentially >> >> >encoded in the tc action codes) >> >> >- we use tc block to map groups of ports (which i dont think bpf has >> >> >internal access of) >> >> > >> >> >In regards to usability: no i dont expect someone doing things at >> >> >scale to use command line tc. The APIs are via netlink. But the tc cli >> >> >is must for the rest of the masses per our traditions. Also i really >> >> >> >> I don't follow. You repeatedly mention "the must of the traditional tc >> >> cli", but what of the existing traditional cli you use for p4tc? >> >> If I look at the examples, pretty much everything looks new to me. >> >> Example: >> >> >> >> tc p4ctrl create myprog/table/mytable dstAddr 10.0.1.2/32 \ >> >> action send_to_port param port eno1 >> >> >> >> This is just TC/RTnetlink used as a channel to pass new things over. If >> >> that is the case, what's traditional here? >> >> >> > >> > >> >What is not traditional about it? >> >> Okay, so in that case, the following example communitating with >> userspace deamon using imaginary "p4ctrl" app is equally traditional: >> $ p4ctrl create myprog/table/mytable dstAddr 10.0.1.2/32 \ >> action send_to_port param port eno1 > >Huh? Thats just an application - classical tc which part of iproute2 >that is sending to the kernel, no different than "tc flower.." >Where do you get the "userspace" daemon part? Yes, you can write a >daemon but it will use the same APIs as tc. Okay, so which part is the "tradition"? > >> >> > >> >> >> >> >didnt even want to use ebpf at all for operator experience reasons - >> >> >it requires a compilation of the code and an extra loading compared to >> >> >what our original u32/pedit code offered. >> >> > >> >> >> I don't quite follow why not most of this could be implemented entirely in >> >> >> user space without the detour of this and you would provide a developer >> >> >> library which could then be integrated into a p4 runtime/frontend? This >> >> >> way users never interface with ebpf parts nor tc given they also shouldn't >> >> >> have to - it's an implementation detail. This is what John was also pointing >> >> >> out earlier. >> >> >> >> >> > >> >> >Netlink is the API. We will provide a library for object manipulation >> >> >which abstracts away the need to know netlink. Someone who for their >> >> >own reasons wants to use p4runtime or TDI could write on top of this. >> >> >I would not design a kernel interface to just meet p4runtime (we >> >> >already have TDI which came later which does things differently). So i >> >> >expect us to support both those two. And if i was to do something on >> >> >SDN that was more robust i would write my own that still uses these >> >> >netlink interfaces. >> >> >> >> Actually, what Daniel says about the p4 library used as a backend to p4 >> >> frontend is pretty much aligned what I claimed on the p4 calls couple of >> >> times. If you have this p4 userspace tooling, it is easy for offloads to >> >> replace the backed by vendor-specific library which allows p4 offload >> >> suitable for all vendors (your plan of p4tc offload does not work well >> >> for our hw, as we repeatedly claimed). >> >> >> > >> >That's you - NVIDIA. You have chosen a path away from the kernel >> >towards DOCA. I understand NVIDIA's frustration with dealing with >> >upstream process (which has been cited to me as a good reason for >> >DOCA) but please dont impose these values and your politics on other >> >vendors(Intel, AMD for example) who are more than willing to invest >> >into making the kernel interfaces the path forward. Your choice. >> >> No, you are missing the point. This has nothing to do with DOCA. > >Right Jiri ;-> > >> This >> has to do with the simple limitation of your offload assuming there are >> no runtime changes in the compiled pipeline. For Intel, maybe they >> aren't, and it's a good fit for them. All I say is, that it is not the >> good fit for everyone. > > a) it is not part of the P4 spec to dynamically make changes to the >datapath pipeline after it is create and we are discussing a P4 Isn't this up to the implementation? I mean from the p4 perspective, everything is static. Hw might need to reshuffle the pipeline internally during rule insertion/remove in order to optimize the layout. >implementation not an extension that would add more value b) We are >more than happy to add extensions in the future to accomodate for >features but first _P4 spec_ must be met c) we had longer discussions >with Matty, Khalid and the Rice folks who wrote a paper on that topic >which you probably didnt attend and everything that needs to be done >can be from user space today for all those optimizations. > >Conclusion is: For what you need to do (which i dont believe is a >limitation in your hardware rather a design decision on your part) run >your user space daemon, do optimizations and update the datapath. >Everybody is happy. Should the userspace daemon listen on inserted rules to be offloade over netlink? > >> >> >Nobody is stopping you from offering your customers proprietary >> >solutions which include a specific ebpf approach alongside DOCA. We >> >believe that a singular interface regardless of the vendor is the >> >right way forward. IMHO, this siloing that unfortunately is also added >> >by eBPF being a double edged sword is not good for the community. >> > >> >> As I also said on the p4 call couple of times, I don't see the kernel >> >> as the correct place to do the p4 abstractions. Why don't you do it in >> >> userspace and give vendors possiblity to have p4 backends with compilers, >> >> runtime optimizations etc in userspace, talking to the HW in the >> >> vendor-suitable way too. Then the SW implementation could be easily eBPF >> >> and the main reason (I believe) why you need to have this is TC >> >> (offload) is then void. >> >> >> >> The "everyone wants to use TC/netlink" claim does not seem correct >> >> to me. Why not to have one Linux p4 solution that fits everyones needs? >> > >> >You mean more fitting to the DOCA world? no, because iam a kernel >> >> Again, this has 0 relation to DOCA. >> >> >> >first person and kernel interfaces are good for everyone. >> >> Yeah, not really. Not always the kernel is the right answer. Your/Intel >> plan to handle the offload by: >> 1) abuse devlink to flash p4 binary >> 2) parse the binary in kernel to match to the table ids of rules coming >> from p4tc ndo_setup_tc >> 3) abuse devlink to flash p4 binary for tc-flower >> 4) parse the binary in kernel to match to the table ids of rules coming >> from tc-flower ndo_setup_tc >> is really something that is making me a little bit nauseous. >> >> If you don't have a feasible plan to do the offload, p4tc does not make >> sense to me to be honest. > >You mean if there's no plan to match your (NVIDIA?) point of view. >For #1 - how's this different from DDP? Wasnt that your suggestion to I doubt that. Any flashing-blob-parsing-in-kernel is something I'm opposed to from day 1. >begin with? For #2 Nobody is proposing to do anything of the sort. The >ndo is passed IDs for the objects and associated contents. For #3+#4 During offload, you need to parse the blob in driver to be able to match the ids with blob entities. That was presented by you/Intel in the past IIRC. >tc flower thing has nothing to do with P4TC that was just some random >proposal someone made seeing if they could ride on top of P4TC. Yeah, it's not yet merged and already mentally used for abuse. I love that :) > >Besides this nobody really has to satisfy your point of view - like i >said earlier feel free to provide proprietary solutions. From a >consumer perspective I would not want to deal with 4 different >vendors with 4 different proprietary approaches. The kernel is the >unifying part. You seemed happier with tc flower just not with the Yeah, that is my point, why the unifying part can't be a userspace daemon/library with multiple backends (p4tc, bpf, vendorX, vendorY, ..)? I just don't see the kernel as a good fit for abstraction here, given the fact that the vendor compilers does not run in kernel. That is breaking your model. >kernel process - which is ironically the same thing we are going >through here ;-> > >cheers, >jamal > >> >> > >> >cheers, >> >jamal