On Mon 29 Mar 2021 at 15:32, Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > Vlad Buslov <vladbu@xxxxxxxxxx> writes: > >> On Thu 25 Mar 2021 at 14:00, Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote: >>> This adds functions that wrap the netlink API used for adding, >>> manipulating, and removing filters and actions. These functions operate >>> directly on the loaded prog's fd, and return a handle to the filter and >>> action using an out parameter (id for tc_cls, and index for tc_act). >>> >>> The basic featureset is covered to allow for attaching, manipulation of >>> properties, and removal of filters and actions. Some additional features >>> like TCA_BPF_POLICE and TCA_RATE for tc_cls have been omitted. These can >>> added on top later by extending the bpf_tc_cls_opts struct. >>> >>> Support for binding actions directly to a classifier by passing them in >>> during filter creation has also been omitted for now. These actions >>> have an auto clean up property because their lifetime is bound to the >>> filter they are attached to. This can be added later, but was omitted >>> for now as direct action mode is a better alternative to it. >>> >>> An API summary: >>> >>> The BPF TC-CLS API >>> >>> bpf_tc_act_{attach, change, replace}_{dev, block} may be used to attach, >>> change, and replace SCHED_CLS bpf classifiers. Separate set of functions >>> are provided for network interfaces and shared filter blocks. >>> >>> bpf_tc_cls_detach_{dev, block} may be used to detach existing SCHED_CLS >>> filter. The bpf_tc_cls_attach_id object filled in during attach, >>> change, or replace must be passed in to the detach functions for them to >>> remove the filter and its attached classififer correctly. >>> >>> bpf_tc_cls_get_info is a helper that can be used to obtain attributes >>> for the filter and classififer. The opts structure may be used to >>> choose the granularity of search, such that info for a specific filter >>> corresponding to the same loaded bpf program can be obtained. By >>> default, the first match is returned to the user. >>> >>> Examples: >>> >>> struct bpf_tc_cls_attach_id id = {}; >>> struct bpf_object *obj; >>> struct bpf_program *p; >>> int fd, r; >>> >>> obj = bpf_object_open("foo.o"); >>> if (IS_ERR_OR_NULL(obj)) >>> return PTR_ERR(obj); >>> >>> p = bpf_object__find_program_by_title(obj, "classifier"); >>> if (IS_ERR_OR_NULL(p)) >>> return PTR_ERR(p); >>> >>> if (bpf_object__load(obj) < 0) >>> return -1; >>> >>> fd = bpf_program__fd(p); >>> >>> r = bpf_tc_cls_attach_dev(fd, if_nametoindex("lo"), >>> BPF_TC_CLSACT_INGRESS, ETH_P_IP, >>> NULL, &id); >>> if (r < 0) >>> return r; >>> >>> ... which is roughly equivalent to (after clsact qdisc setup): >>> # tc filter add dev lo ingress bpf obj /home/kkd/foo.o sec classifier >>> >>> If a user wishes to modify existing options on an attached filter, the >>> bpf_tc_cls_change_{dev, block} API may be used. Parameters like >>> chain_index, priority, and handle are ignored in the bpf_tc_cls_opts >>> struct as they cannot be modified after attaching a filter. >>> >>> Example: >>> >>> /* Optional parameters necessary to select the right filter */ >>> DECLARE_LIBBPF_OPTS(bpf_tc_cls_opts, opts, >>> .handle = id.handle, >>> .priority = id.priority, >>> .chain_index = id.chain_index) >>> /* Turn on direct action mode */ >>> opts.direct_action = true; >>> r = bpf_tc_cls_change_dev(fd, id.ifindex, id.parent_id, >>> id.protocol, &opts, &id); >>> if (r < 0) >>> return r; >>> >>> /* Verify that the direct action mode has been set */ >>> struct bpf_tc_cls_info info = {}; >>> r = bpf_tc_cls_get_info_dev(fd, id.ifindex, id.parent_id, >>> id.protocol, &opts, &info); >>> if (r < 0) >>> return r; >>> >>> assert(info.bpf_flags & TCA_BPF_FLAG_ACT_DIRECT); >>> >>> This would be roughly equivalent to doing: >>> # tc filter change dev lo egress prio <p> handle <h> bpf obj /home/kkd/foo.o section classifier da >>> >>> ... except a new bpf program will be loaded and replace existing one. >>> >>> If a user wishes to either replace an existing filter, or create a new >>> one with the same properties, they can use bpf_tc_cls_replace_dev. The >>> benefit of bpf_tc_cls_change is that it fails if no matching filter >>> exists. >>> >>> The BPF TC-ACT API >>> >>> bpf_tc_act_{attach, replace} may be used to attach and replace already >>> attached SCHED_ACT actions. Passing an index of 0 has special meaning, >>> in that an index will be automatically chosen by the kernel. The index >>> chosen by the kernel is the return value of these functions in case of >>> success. >>> >>> bpf_tc_act_detach may be used to detach a SCHED_ACT action prog >>> identified by the index parameter. The index 0 again has a special >>> meaning, in that passing it will flush all existing SCHED_ACT actions >>> loaded using the ACT API. >>> >>> bpf_tc_act_get_info is a helper to get the required attributes of a >>> loaded program to be able to manipulate it futher, by passing them >>> into the aforementioned functions. >>> >>> Example: >>> >>> struct bpf_object *obj; >>> struct bpf_program *p; >>> __u32 index; >>> int fd, r; >>> >>> obj = bpf_object_open("foo.o"); >>> if (IS_ERR_OR_NULL(obj)) >>> return PTR_ERR(obj); >>> >>> p = bpf_object__find_program_by_title(obj, "action"); >>> if (IS_ERR_OR_NULL(p)) >>> return PTR_ERR(p); >>> >>> if (bpf_object__load(obj) < 0) >>> return -1; >>> >>> fd = bpf_program__fd(p); >>> >>> r = bpf_tc_act_attach(fd, NULL, &index); >>> if (r < 0) >>> return r; >>> >>> if (bpf_tc_act_detach(index)) >>> return -1; >>> >>> ... which is equivalent to the following sequence: >>> tc action add action bpf obj /home/kkd/foo.o sec action >>> tc action del action bpf index <idx> >> >> How do you handle the locking here? Please note that while >> RTM_{NEW|GET|DEL}FILTER API has been refactored to handle its own >> locking internally (and registered with RTNL_FLAG_DOIT_UNLOCKED flag), >> RTM_{NEW|GET|DEL}ACTION API still expects to be called with rtnl lock >> taken. > > Huh, locking? This is all userspace code that uses the netlink API... > > -Toke Thanks for the clarification. I'm not familiar with libbpf internals and it wasn't obvious to me that this functionality is not for creating classifiers/actions from BPF program executing in kernel-space.