On Fri, May 28, 2021 at 1:00 PM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote: > > This is the first RFC version. > > This adds a bpf_link path to create TC filters tied to cls_bpf classifier, and > introduces fd based ownership for such TC filters. Netlink cannot delete or > replace such filters, but the bpf_link is severed on indirect destruction of the > filter (backing qdisc being deleted, or chain being flushed, etc.). To ensure > that filters remain attached beyond process lifetime, the usual bpf_link fd > pinning approach can be used. > > The individual patches contain more details and comments, but the overall kernel > API and libbpf helper mirrors the semantics of the netlink based TC-BPF API > merged recently. This means that we start by always setting direct action mode, > protocol to ETH_P_ALL, chain_index as 0, etc. If there is a need for more > options in the future, they can be easily exposed through the bpf_link API in > the future. > > Patch 1 refactors cls_bpf change function to extract two helpers that will be > reused in bpf_link creation. > > Patch 2 exports some bpf_link management functions to modules. This is needed > because our bpf_link object is tied to the cls_bpf_prog object. Tying it to > tcf_proto would be weird, because the update path has to replace offloaded bpf > prog, which happens using internal cls_bpf helpers, and would in general be more > code to abstract over an operation that is unlikely to be implemented for other > filter types. > > Patch 3 adds the main bpf_link API. A function in cls_api takes care of > obtaining block reference, creating the filter object, and then calls the > bpf_link_change tcf_proto op (only supported by cls_bpf) that returns a fd after > setting up the internal structures. An optimization is made to not keep around > resources for extended actions, which is explained in a code comment as it wasn't > immediately obvious. > > Patch 4 adds an update path for bpf_link. Since bpf_link_update only supports > replacing the bpf_prog, we can skip tc filter's change path by reusing the > filter object but swapping its bpf_prog. This takes care of replacing the > offloaded prog as well (if that fails, update is aborted). So far however, > tcf_classify could do normal load (possibly torn) as the cls_bpf_prog->filter > would never be modified concurrently. This is no longer true, and to not > penalize the classify hot path, we also cannot impose serialization around > its load. Hence the load is changed to READ_ONCE, so that the pointer value is > always consistent. Due to invocation in a RCU critical section, the lifetime of > the prog is guaranteed for the duration of the call. > > Patch 5, 6 take care of updating the userspace bits and add a bpf_link returning > function to libbpf. > > Patch 7 adds a selftest that exercises all possible problematic interactions > that I could think of. > > Design: > > This is where in the object hierarchy our bpf_link object is attached. > > ┌─────┐ > │ │ > │ BPF │ > program > │ │ > └──▲──┘ > ┌───────┐ │ > │ │ ┌──────┴───────┐ > │ mod ├─────────► cls_bpf_prog │ > ┌────────────────┐ │cls_bpf│ └────┬───▲─────┘ > │ tcf_block │ │ │ │ │ > └────────┬───────┘ └───▲───┘ │ │ > │ ┌─────────────┐ │ ┌─▼───┴──┐ > └──────────► tcf_chain │ │ │bpf_link│ > └───────┬─────┘ │ └────────┘ > │ ┌─────────────┐ │ > └──────────► tcf_proto ├────┘ > └─────────────┘ > > The bpf_link is detached on destruction of the cls_bpf_prog. Doing it this way > allows us to implement update in a lightweight manner without having to recreate > a new filter, where we can just replace the BPF prog attached to cls_bpf_prog. > > The other way to do it would be to link the bpf_link to tcf_proto, there are > numerous downsides to this: > > 1. All filters have to embed the pointer even though they won't be using it when > cls_bpf is compiled in. > 2. This probably won't make sense to be extended to other filter types anyway. > 3. We aren't able to optimize the update case without adding another bpf_link > specific update operation to tcf_proto ops. > > The downside with tying this to the module is having to export bpf_link > management functions and introducing a tcf_proto op. Hopefully the cost of > another operation func pointer is not big enough (as there is only one ops > struct per module). > > This first version is to collect feedback on the approach and get ideas if there > is a better way to do this. Bpf_link-based TC API is a long time coming, so it's great to see someone finally working on this. Thanks! I briefly skimmed through the patch set, noticed a few generic bpf_link problems. But I think main feedback will come from Cilium folks and others that heavily rely on TC APIs. I wonder if there is an opportunity to simplify the API further given we have a new opportunity here. I don't think we are constrained to follow legacy TC API exactly. The problem is that your patch set was marked as spam by Google, so I suspect a bunch of folks haven't gotten it. I suggest re-sending it again but trimming down the CC list, leaving only bpf@vger, netdev@vger, and BPF maintainers CC'ed directly. > > Kumar Kartikeya Dwivedi (7): > net: sched: refactor cls_bpf creation code > bpf: export bpf_link functions for modules > net: sched: add bpf_link API for bpf classifier > net: sched: add lightweight update path for cls_bpf > tools: bpf.h: sync with kernel sources > libbpf: add bpf_link based TC-BPF management API > libbpf: add selftest for bpf_link based TC-BPF management API > > include/linux/bpf_types.h | 3 + > include/net/pkt_cls.h | 13 + > include/net/sch_generic.h | 6 +- > include/uapi/linux/bpf.h | 15 + > kernel/bpf/syscall.c | 14 +- > net/sched/cls_api.c | 138 ++++++- > net/sched/cls_bpf.c | 386 ++++++++++++++++-- > tools/include/uapi/linux/bpf.h | 15 + > tools/lib/bpf/bpf.c | 5 + > tools/lib/bpf/bpf.h | 8 +- > tools/lib/bpf/libbpf.c | 59 ++- > tools/lib/bpf/libbpf.h | 17 + > tools/lib/bpf/libbpf.map | 1 + > tools/lib/bpf/netlink.c | 5 +- > tools/lib/bpf/netlink.h | 8 + > .../selftests/bpf/prog_tests/tc_bpf_link.c | 285 +++++++++++++ > 16 files changed, 934 insertions(+), 44 deletions(-) > create mode 100644 tools/lib/bpf/netlink.h > create mode 100644 tools/testing/selftests/bpf/prog_tests/tc_bpf_link.c > > -- > 2.31.1 >