On Thu, Jan 25, 2024 at 10:47 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: > > On 1/24/24 3:40 PM, Jamal Hadi Salim wrote: > > On Wed, Jan 24, 2024 at 8:59 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote: > >> On 1/22/24 8:48 PM, Jamal Hadi Salim wrote: > [...] > >>> > >>> It should also be noted that it is feasible to split some of the ingress > >>> datapath into XDP first and more into TC later (as was shown above for > >>> example where the parser runs at XDP level). YMMV. > >>> Regardless of choice of which scheme to use, none of these will affect > >>> UAPI. It will all depend on whether you generate code to load on XDP vs > >>> tc, etc. > >>> > >>> Co-developed-by: Victor Nogueira <victor@xxxxxxxxxxxx> > >>> Signed-off-by: Victor Nogueira <victor@xxxxxxxxxxxx> > >>> Co-developed-by: Pedro Tammela <pctammela@xxxxxxxxxxxx> > >>> Signed-off-by: Pedro Tammela <pctammela@xxxxxxxxxxxx> > >>> Signed-off-by: Jamal Hadi Salim <jhs@xxxxxxxxxxxx> > >> > >> My objections from last iterations still stand, and I also added a nak, > >> so please do not just drop it with new revisions.. from the v10 as you > >> wrote you added further code but despite the various community feedback > >> the design still stands as before, therefore: > >> > >> Nacked-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx> > > > > We didnt make code changes - but did you read the cover letter and the > > extended commentary in this patch's commit log? We should have > > mentioned it in the changes log. It did respond to your comments. > > There's text that says "the filter manages the lifetime of the > > pipeline" - which in the future could include not only tc but XDP but > > also the hardware path (in the form of a file that gets loaded). I am > > not sure if that message is clear. Your angle being this is layer > > violation. In the last discussion i asked you for suggestions and we > > went the tcx route, which didnt make sense, and then you didnt > > respond. > [...] > > >> Also as mentioned earlier I don't think tc should hold references on > >> XDP programs in here. It doesn't make any sense aside from the fact > >> that the cls_p4 is also not doing anything with it. This is something > >> that a user space control plane should be doing i.e. managing a XDP > >> link on the target device. > > > > This is the same argument about layer violation that you made earlier. > > The filter manages the p4 pipeline - i.e it's not just about the ebpf > > blob(s) but for example in the future (discussions are still ongoing > > with vendors who have P4 NICs) a filter could be loaded to also > > specify the location of the hardware blob. > > Ah, so there is a plan to eventually add HW offload support for cls_p4? > Or is this only specifiying a location of a blob through some opaque > cookie value from user space? Current thought process is it will be something along these lines (the commit provides more details): tc filter add block 22 ingress protocol all prio 1 p4 pname simple_l3 \ prog type hw filename "mypnameprog.o" ... \ prog type xdp obj $PARSER.o section parser/xdp pinned_link /sys/fs/bpf/mylink \ action bpf obj $PROGNAME.o section prog/tc-ingress These discussions are still ongoing - but that is the current consensus. Note: we are not pushing any code for that, but hope it paints the bigger picture.... The idea is the cls p4 owns the lifetime of the pipeline. Installing the filter instantiates the p4 pipeline "simple_l3" and triggers a lot of the refcounts to make sure the pipeline and its components stays alive. There could be multiple such filters - when someone deletes the last filter, then it is safe to delete the pipeline. Essentially the filter manages the lifetime of the pipeline. > > I would be happy with a suggestion that gets us moving forward with > > that context in mind. > > My question on the above is mainly what does it bring you to hold a > reference on the XDP program? There is no guarantee that something else > will get loaded onto XDP, and then eventually the cls_p4 is the only > entity holding the reference but w/o 'purpose'. We do have BPF links > and the user space component orchestrating all this needs to create > and pin the BPF link in BPF fs, for example. An artificial reference > on XDP prog feels similar as if you'd hold a reference on an inode > out of tc.. Again, that should be delegated to the control plane you > have running interacting with the compiler which then manages and > loads its artifacts. What if you would also need to set up some > netfilter rules for the SW pipeline, would you then embed this too? Sorry, a slight tangent first: P4 is self-contained, there are a handful of objects that are defined by the spec (externs, actions, tables, etc) and we model them in the patchset, so that part is self-contained. For the extra richness such as the netfilter example you quoted - based on my many years of experience deploying SDN - using daemons(sorry if i am reading too much in what I think you are implying) for control is not the best option i.e you need all kinds of coordination - for example where do you store state, what happens when the daemon dies, how do you graceful restarts etc. Based on that, if i can put things in the kernel (which is essentially a "perpetual daemon", unless the kernel crashes) it's a lot simpler to manage as a source of truth especially when there is not that much info. There is a limit when there are multiple pieces (to use your netfilter example) because you need another layer to coordinate things. Re: the XDP part - our key reason is mostly managerial, in that the filter is the lifetime manager of the pipeline; and that if i dump that filter i can see all the details in regards to the pipeline(tc, XDP and in future hw, etc) in one spot. You are right, the link pinning is our protection from someone replacing the XDP prog (this was a tip from Toke in the early days) and the comparison of tc holding inode is apropos. There's some history: in the early days we were also using metadata which comes from the XDP program at the tc layer if more processing was to be done (and there was extra metadata which told us which XDP prog produced it which we would vet before trusting the metadata). Given all the above, we should still be able to hold this info without necessarily holding the extra refcount and be able to see this detail. So we can remove the refcounting. cheers, jamal > Thanks, > Daniel