Re: [PATCH v10 net-next 15/15] p4tc: add P4 classifier

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/16/24 10:18 PM, Jamal Hadi Salim wrote:
On Thu, Jan 25, 2024 at 12:59 PM Jamal Hadi Salim <jhs@xxxxxxxxxxxx> wrote:
On Thu, Jan 25, 2024 at 10:47 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:
On 1/24/24 3:40 PM, Jamal Hadi Salim wrote:
On Wed, Jan 24, 2024 at 8:59 AM Daniel Borkmann <daniel@xxxxxxxxxxxxx> wrote:
On 1/22/24 8:48 PM, Jamal Hadi Salim wrote:
[...]

It should also be noted that it is feasible to split some of the ingress
datapath into XDP first and more into TC later (as was shown above for
example where the parser runs at XDP level). YMMV.
Regardless of choice of which scheme to use, none of these will affect
UAPI. It will all depend on whether you generate code to load on XDP vs
tc, etc.

Co-developed-by: Victor Nogueira <victor@xxxxxxxxxxxx>
Signed-off-by: Victor Nogueira <victor@xxxxxxxxxxxx>
Co-developed-by: Pedro Tammela <pctammela@xxxxxxxxxxxx>
Signed-off-by: Pedro Tammela <pctammela@xxxxxxxxxxxx>
Signed-off-by: Jamal Hadi Salim <jhs@xxxxxxxxxxxx>

My objections from last iterations still stand, and I also added a nak,
so please do not just drop it with new revisions.. from the v10 as you
wrote you added further code but despite the various community feedback
the design still stands as before, therefore:

Nacked-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx>

We didnt make code changes - but did you read the cover letter and the
extended commentary in this patch's commit log? We should have
mentioned it in the changes log. It did respond to your comments.
There's text that says "the filter manages the lifetime of the
pipeline" - which in the future could include not only tc but XDP but
also the hardware path (in the form of a file that gets loaded). I am
not sure if that message is clear. Your angle being this is layer
violation. In the last discussion i asked you for suggestions and we
went the tcx route, which didnt make sense, and  then you didnt
respond.
[...]

Also as mentioned earlier I don't think tc should hold references on
XDP programs in here. It doesn't make any sense aside from the fact
that the cls_p4 is also not doing anything with it. This is something
that a user space control plane should be doing i.e. managing a XDP
link on the target device.

This is the same argument about layer violation that you made earlier.
The filter manages the p4 pipeline - i.e it's not just about the ebpf
blob(s) but for example in the future (discussions are still ongoing
with vendors who have P4 NICs) a filter could be loaded to also
specify the location of the hardware blob.

Ah, so there is a plan to eventually add HW offload support for cls_p4?
Or is this only specifiying a location of a blob through some opaque
cookie value from user space?

Current thought process is it will be something along these lines (the
commit provides more details):

tc filter add block 22 ingress protocol all prio 1 p4 pname simple_l3 \
    prog type hw filename "mypnameprog.o" ... \
    prog type xdp obj $PARSER.o section parser/xdp pinned_link
/sys/fs/bpf/mylink \
    action bpf obj $PROGNAME.o section prog/tc-ingress

These discussions are still ongoing - but that is the current
consensus. Note: we are not pushing any code for that, but hope it
paints the bigger picture....
The idea is the cls p4 owns the lifetime of the pipeline. Installing
the filter instantiates the p4 pipeline "simple_l3" and triggers a lot
of the refcounts to make sure the pipeline and its components stays
alive.
There could be multiple such filters - when someone deletes the last
filter, then it is safe to delete the pipeline.
Essentially the filter manages the lifetime of the pipeline.

I would be happy with a suggestion that gets us moving forward with
that context in mind.

My question on the above is mainly what does it bring you to hold a
reference on the XDP program? There is no guarantee that something else
will get loaded onto XDP, and then eventually the cls_p4 is the only
entity holding the reference but w/o 'purpose'. We do have BPF links
and the user space component orchestrating all this needs to create
and pin the BPF link in BPF fs, for example. An artificial reference
on XDP prog feels similar as if you'd hold a reference on an inode
out of tc.. Again, that should be delegated to the control plane you
have running interacting with the compiler which then manages and
loads its artifacts. What if you would also need to set up some
netfilter rules for the SW pipeline, would you then embed this too?

Sorry, a slight tangent first:
P4 is self-contained, there are a handful of objects that are defined
by the spec (externs, actions, tables, etc) and we model them in the
patchset, so that part is self-contained. For the extra richness such
as the netfilter example you quoted - based on my many years of
experience deploying SDN - using daemons(sorry if i am reading too
much in what I think you are implying) for control is not the best
option i.e you need all kinds of coordination - for example where do
you store state, what happens when the daemon dies, how do you
graceful restarts etc. Based on that, if i can put things in the
kernel (which is essentially a "perpetual daemon", unless the kernel
crashes) it's a lot simpler to manage as a source of truth especially
when there is not that much info. There is a limit when there are
multiple pieces (to use your netfilter example) because you need
another layer to coordinate things.

'source of truth' for the various attach points or BPF links, yes, but in
this case here it is not, since the source of truth on what is attached
is not in cls_p4 but rather on the XDP link. How do you handle the case
when cls_p4 says something different to what is /actually/ attached? Why
is it not enough to establish some convention in user space, to pin the
link and retrieve/update from there when needed? Like everyone else does.
... even if you consider iproute2 your "control plane" (which I have the
feeling you do)?

Re: the XDP part - our key reason is mostly managerial, in that the
filter is the lifetime manager of the pipeline; and that if i dump

This is imho the problematic part which feels like square peg in round
hole, trying to fit this whole lifetime manager of the pipeline into
the cls_p4 filter. We agree to disagree here. Instead of reusing
individual building blocks from user space, this tries to cramp control
plane parts into the kernel for which its not a great fit with what is
build here as-is.

that filter i can see all the details in regards to the pipeline(tc,
XDP and in future hw, etc) in one spot. You are right, the link
pinning is our protection from someone replacing the XDP prog (this
was a tip from Toke in the early days) and the comparison of tc
holding inode is apropos.
There's some history: in the early days we were also using metadata
which comes from the XDP program at the tc layer if more processing
was to be done (and there was extra metadata which told us which XDP
prog produced it which we would vet before trusting the metadata).
Given all the above, we should still be able to hold this info without
necessarily holding the extra refcount and be able to see this detail.
So we can remove the refcounting.

Daniel?

The refcount should definitely be removed, but then again, see the point
above in that it is inconsistent information. Why can't this be done in
user space with some convention in your user space control plane - if you
take iproute2, then why it cannot pin the link in a bpf fs instance and
retrieve it from there?




[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux