On Sat, Jun 05, 2021 at 08:38:17AM IST, Yonghong Song wrote: > > > On 6/3/21 11:31 PM, Kumar Kartikeya Dwivedi wrote: > > This commit introduces a bpf_link based kernel API for creating tc > > filters and using the cls_bpf classifier. Only a subset of what netlink > > API offers is supported, things like TCA_BPF_POLICE, TCA_RATE and > > embedded actions are unsupported. > > > > The kernel API and the libbpf wrapper added in a subsequent patch are > > more opinionated and mirror the semantics of low level netlink based > > TC-BPF API, i.e. always setting direct action mode, always setting > > protocol to ETH_P_ALL, and only exposing handle and priority as the > > variables the user can control. We add an additional gen_flags parameter > > though to allow for offloading use cases. It would be trivial to extend > > the current API to support specifying other attributes in the future, > > but for now I'm sticking how we want to push usage. > > > > The semantics around bpf_link support are as follows: > > > > A user can create a classifier attached to a filter using the bpf_link > > API, after which changing it and deleting it only happens through the > > bpf_link API. It is not possible to bind the bpf_link to existing > > filter, and any such attempt will fail with EEXIST. Hence EEXIST can be > > returned in two cases, when existing bpf_link owned filter exists, or > > existing netlink owned filter exists. > > > > Removing bpf_link owned filter from netlink returns EPERM, denoting that > > netlink is locked out from filter manipulation when bpf_link is > > involved. > > > > Whenever a filter is detached due to chain removal, or qdisc tear down, > > or net_device shutdown, the bpf_link becomes automatically detached. > > > > In this way, the netlink API and bpf_link creation path are exclusive > > and don't stomp over one another. Filters created using bpf_link API > > cannot be replaced by netlink API, and filters created by netlink API are > > never replaced by bpf_link. Netfilter also cannot detach bpf_link filters. > > > > We serialize all changes dover rtnl_lock as cls_bpf API doesn't support the > > dover => over? > Thanks, will fix. > > unlocked classifier API. > > > > Reviewed-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx>. > > Signed-off-by: Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> > > --- > > include/linux/bpf_types.h | 3 + > > include/net/pkt_cls.h | 13 ++ > > include/net/sch_generic.h | 6 +- > > include/uapi/linux/bpf.h | 15 +++ > > kernel/bpf/syscall.c | 10 +- > > net/sched/cls_api.c | 139 ++++++++++++++++++++- > > net/sched/cls_bpf.c | 250 +++++++++++++++++++++++++++++++++++++- > > 7 files changed, 430 insertions(+), 6 deletions(-) > > > [...] > > subsys_initcall(tc_filter_init); > > + > > +#if IS_ENABLED(CONFIG_NET_CLS_BPF) > > + > > +int bpf_tc_link_attach(union bpf_attr *attr, struct bpf_prog *prog) > > +{ > > + struct net *net = current->nsproxy->net_ns; > > + struct tcf_chain_info chain_info; > > + u32 chain_index, prio, parent; > > + struct tcf_block *block; > > + struct tcf_chain *chain; > > + struct tcf_proto *tp; > > + int err, tp_created; > > + unsigned long cl; > > + struct Qdisc *q; > > + __be16 protocol; > > + void *fh; > > + > > + /* Caller already checks bpf_capable */ > > + if (!ns_capable(current->nsproxy->net_ns->user_ns, CAP_NET_ADMIN)) > > net->user_ns? > True, will fix. > > + return -EPERM; > > + > > + if (attr->link_create.flags || > > + !attr->link_create.target_ifindex || > > + !tc_flags_valid(attr->link_create.tc.gen_flags)) > > + return -EINVAL; > > + > [...] -- Kartikeya