Re: [PATCH net-next v9 15/15] p4tc: add P4 classifier

Daniel Borkmann <daniel@xxxxxxxxxxxxx> · Tue, 5 Dec 2023 14:43:40 +0100

On 12/5/23 1:32 AM, John Fastabend wrote:
Jamal Hadi Salim wrote:
Introduce P4 tc classifier. A tc filter instantiated on this classifier
is used to bind a P4 pipeline to one or more netdev ports. To use P4
classifier you must specify a pipeline name that will be associated to
this filter, a s/w parser and datapath ebpf program. The pipeline must have
already been created via a template.
For example, if we were to add a filter to ingress of network interface
device $P0 and associate it to P4 pipeline simple_l3 we'd issue the
following command:

In addition to my comments from last iteration.

tc filter add dev $P0 parent ffff: protocol all prio 6 p4 pname simple_l3 \
     action bpf obj $PARSER.o section prog/tc-parser \
     action bpf obj $PROGNAME.o section prog/tc-ingress

Having multiple object files is a mistake IMO and will cost
performance. Have a single object file avoid stitching together
metadata and run to completion. And then run entirely from XDP
this is how we have been getting good performance numbers.

+1, fully agree.

$PROGNAME.o and $PARSER.o is a compilation of the eBPF programs generated
by the P4 compiler and will be the representation of the P4 program.
Note that filter understands that $PARSER.o is a parser to be loaded
at the tc level. The datapath program is merely an eBPF action.

Note we do support a distinct way of loading the parser as opposed to
making it be an action, the above example would be:

tc filter add dev $P0 parent ffff: protocol all prio 6 p4 pname simple_l3 \
     prog type tc obj $PARSER.o ... \
     action bpf obj $PROGNAME.o section prog/tc-ingress

We support two types of loadings of these initial programs in the pipeline
and differentiate between what gets loaded at tc vs xdp by using syntax of

either "prog type tc obj" or "prog type xdp obj"

For XDP:

tc filter add dev $P0 ingress protocol all prio 1 p4 pname simple_l3 \
     prog type xdp obj $PARSER.o section parser/xdp \
     pinned_link /sys/fs/bpf/mylink \
     action bpf obj $PROGNAME.o section prog/tc-ingress

I don't think tc should be loading xdp programs. XDP is not 'tc'.

For XDP, we do have a separate attach API, for BPF links we have bpf_xdp_link_attach()
via bpf(2) and regular progs we have the classic way via dev_change_xdp_fd() with
IFLA_XDP_* attributes. Mid-term we'll also add bpf_mprog support for XDP to allow
multi-user attachment. tc kernel code should not add yet another way of attaching XDP,
this should just reuse existing uapi infra instead from userspace control plane side.

The theory of operations is as follows:

================================1. PARSING================================

The packet first encounters the parser.
The parser is implemented in ebpf residing either at the TC or XDP
level. The parsed header values are stored in a shared eBPF map.
When the parser runs at XDP level, we load it into XDP using tc filter
command and pin it to a file.

=============================2. ACTIONS=============================

In the above example, the P4 program (minus the parser) is encoded in an
action($PROGNAME.o). It should be noted that classical tc actions
continue to work:
IOW, someone could decide to add a mirred action to mirror all packets
after or before the ebpf action.

tc filter add dev $P0 parent ffff: protocol all prio 6 p4 pname simple_l3 \
     prog type tc obj $PARSER.o section parser/tc-ingress \
     action bpf obj $PROGNAME.o section prog/tc-ingress \
     action mirred egress mirror index 1 dev $P1 \
     action bpf obj $ANOTHERPROG.o section mysect/section-1

It should also be noted that it is feasible to split some of the ingress
datapath into XDP first and more into TC later (as was shown above for
example where the parser runs at XDP level). YMMV.

Is there any performance value in partial XDP and partial TC? The main
wins we see in XDP are when we can drop, redirect, etc the packet
entirely in XDP and avoid skb altogether.

Co-developed-by: Victor Nogueira <victor@xxxxxxxxxxxx>
Signed-off-by: Victor Nogueira <victor@xxxxxxxxxxxx>
Co-developed-by: Pedro Tammela <pctammela@xxxxxxxxxxxx>
Signed-off-by: Pedro Tammela <pctammela@xxxxxxxxxxxx>
Signed-off-by: Jamal Hadi Salim <jhs@xxxxxxxxxxxx>

The cls_p4 is roughly a copy of {cls,act}_bpf, and from a BPF community side
we moved away from this some time ago for the benefit of a better management
API for tc BPF programs via bpf(2) through bpf_mprog (see libbpf and BPF selftests
around this), as mentioned earlier. Please use this instead for your userspace
control plane, otherwise we are repeating the same mistakes from the past again
that were already fixed. Therefore, from BPF side:

Nacked-by: Daniel Borkmann <daniel@xxxxxxxxxxxxx>

Cheers,
Daniel