On Thu, Jun 3, 2021 at 11:32 PM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote: > > This commit introduces a bpf_link based kernel API for creating tc > filters and using the cls_bpf classifier. Only a subset of what netlink > API offers is supported, things like TCA_BPF_POLICE, TCA_RATE and > embedded actions are unsupported. > > The kernel API and the libbpf wrapper added in a subsequent patch are > more opinionated and mirror the semantics of low level netlink based > TC-BPF API, i.e. always setting direct action mode, always setting > protocol to ETH_P_ALL, and only exposing handle and priority as the > variables the user can control. We add an additional gen_flags parameter > though to allow for offloading use cases. It would be trivial to extend > the current API to support specifying other attributes in the future, > but for now I'm sticking how we want to push usage. > > The semantics around bpf_link support are as follows: > > A user can create a classifier attached to a filter using the bpf_link > API, after which changing it and deleting it only happens through the > bpf_link API. It is not possible to bind the bpf_link to existing > filter, and any such attempt will fail with EEXIST. Hence EEXIST can be > returned in two cases, when existing bpf_link owned filter exists, or > existing netlink owned filter exists. > > Removing bpf_link owned filter from netlink returns EPERM, denoting that > netlink is locked out from filter manipulation when bpf_link is > involved. > > Whenever a filter is detached due to chain removal, or qdisc tear down, > or net_device shutdown, the bpf_link becomes automatically detached. > > In this way, the netlink API and bpf_link creation path are exclusive > and don't stomp over one another. Filters created using bpf_link API > cannot be replaced by netlink API, and filters created by netlink API are > never replaced by bpf_link. Netfilter also cannot detach bpf_link filters. > > We serialize all changes dover rtnl_lock as cls_bpf API doesn't support the > unlocked classifier API. > > Reviewed-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx>. > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> > --- > include/linux/bpf_types.h | 3 + > include/net/pkt_cls.h | 13 ++ > include/net/sch_generic.h | 6 +- > include/uapi/linux/bpf.h | 15 +++ > kernel/bpf/syscall.c | 10 +- > net/sched/cls_api.c | 139 ++++++++++++++++++++- > net/sched/cls_bpf.c | 250 +++++++++++++++++++++++++++++++++++++- > 7 files changed, 430 insertions(+), 6 deletions(-) > [...] > @@ -1447,6 +1449,12 @@ union bpf_attr { > __aligned_u64 iter_info; /* extra bpf_iter_link_info */ > __u32 iter_info_len; /* iter_info length */ > }; > + struct { /* used by BPF_TC */ > + __u32 parent; > + __u32 handle; > + __u32 gen_flags; There is already link_create.flags that's totally up to a specific type of bpf_link. E.g., cgroup bpf_link doesn't accept any flags, while xdp bpf_link uses it for passing XDP-specific flags. Is there a need to have both gen_flags and flags for TC link? > + __u16 priority; No strong preference, but we typically try to not have unnecessary padding in UAPI bpf_attr, so I wonder if using __u32 for this would make sense? > + } tc; > }; > } link_create; > > @@ -5519,6 +5527,13 @@ struct bpf_link_info { > struct { > __u32 ifindex; > } xdp; > + struct { > + __u32 ifindex; > + __u32 parent; > + __u32 handle; > + __u32 gen_flags; > + __u16 priority; > + } tc; > }; > } __attribute__((aligned(8))); > [...]