On 6/3/21 11:31 PM, Kumar Kartikeya Dwivedi wrote:
This commit introduces a bpf_link based kernel API for creating tc filters and using the cls_bpf classifier. Only a subset of what netlink API offers is supported, things like TCA_BPF_POLICE, TCA_RATE and embedded actions are unsupported. The kernel API and the libbpf wrapper added in a subsequent patch are more opinionated and mirror the semantics of low level netlink based TC-BPF API, i.e. always setting direct action mode, always setting protocol to ETH_P_ALL, and only exposing handle and priority as the variables the user can control. We add an additional gen_flags parameter though to allow for offloading use cases. It would be trivial to extend the current API to support specifying other attributes in the future, but for now I'm sticking how we want to push usage. The semantics around bpf_link support are as follows: A user can create a classifier attached to a filter using the bpf_link API, after which changing it and deleting it only happens through the bpf_link API. It is not possible to bind the bpf_link to existing filter, and any such attempt will fail with EEXIST. Hence EEXIST can be returned in two cases, when existing bpf_link owned filter exists, or existing netlink owned filter exists. Removing bpf_link owned filter from netlink returns EPERM, denoting that netlink is locked out from filter manipulation when bpf_link is involved. Whenever a filter is detached due to chain removal, or qdisc tear down, or net_device shutdown, the bpf_link becomes automatically detached. In this way, the netlink API and bpf_link creation path are exclusive and don't stomp over one another. Filters created using bpf_link API cannot be replaced by netlink API, and filters created by netlink API are never replaced by bpf_link. Netfilter also cannot detach bpf_link filters. We serialize all changes dover rtnl_lock as cls_bpf API doesn't support the
dover => over?
unlocked classifier API. Reviewed-by: Toke Høiland-Jørgensen <toke@xxxxxxxxxx>. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> --- include/linux/bpf_types.h | 3 + include/net/pkt_cls.h | 13 ++ include/net/sch_generic.h | 6 +- include/uapi/linux/bpf.h | 15 +++ kernel/bpf/syscall.c | 10 +- net/sched/cls_api.c | 139 ++++++++++++++++++++- net/sched/cls_bpf.c | 250 +++++++++++++++++++++++++++++++++++++- 7 files changed, 430 insertions(+), 6 deletions(-)
[...]
subsys_initcall(tc_filter_init); + +#if IS_ENABLED(CONFIG_NET_CLS_BPF) + +int bpf_tc_link_attach(union bpf_attr *attr, struct bpf_prog *prog) +{ + struct net *net = current->nsproxy->net_ns; + struct tcf_chain_info chain_info; + u32 chain_index, prio, parent; + struct tcf_block *block; + struct tcf_chain *chain; + struct tcf_proto *tp; + int err, tp_created; + unsigned long cl; + struct Qdisc *q; + __be16 protocol; + void *fh; + + /* Caller already checks bpf_capable */ + if (!ns_capable(current->nsproxy->net_ns->user_ns, CAP_NET_ADMIN))
net->user_ns?
+ return -EPERM; + + if (attr->link_create.flags || + !attr->link_create.target_ifindex || + !tc_flags_valid(attr->link_create.tc.gen_flags)) + return -EINVAL; +
[...]