On Tue, Apr 6, 2021 at 3:06 AM Toke Høiland-Jørgensen <toke@xxxxxxxxxx> wrote: > > Andrii Nakryiko <andrii.nakryiko@xxxxxxxxx> writes: > > > On Sat, Apr 3, 2021 at 10:47 AM Alexei Starovoitov > > <alexei.starovoitov@xxxxxxxxx> wrote: > >> > >> On Sat, Apr 03, 2021 at 12:38:06AM +0530, Kumar Kartikeya Dwivedi wrote: > >> > On Sat, Apr 03, 2021 at 12:02:14AM IST, Alexei Starovoitov wrote: > >> > > On Fri, Apr 2, 2021 at 8:27 AM Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> wrote: > >> > > > [...] > >> > > > >> > > All of these things are messy because of tc legacy. bpf tried to follow tc style > >> > > with cls and act distinction and it didn't quite work. cls with > >> > > direct-action is the only > >> > > thing that became mainstream while tc style attach wasn't really addressed. > >> > > There were several incidents where tc had tens of thousands of progs attached > >> > > because of this attach/query/index weirdness described above. > >> > > I think the only way to address this properly is to introduce bpf_link style of > >> > > attaching to tc. Such bpf_link would support ingress/egress only. > >> > > direction-action will be implied. There won't be any index and query > >> > > will be obvious. > >> > > >> > Note that we already have bpf_link support working (without support for pinning > >> > ofcourse) in a limited way. The ifindex, protocol, parent_id, priority, handle, > >> > chain_index tuple uniquely identifies a filter, so we stash this in the bpf_link > >> > and are able to operate on the exact filter during release. > >> > >> Except they're not unique. The library can stash them, but something else > >> doing detach via iproute2 or their own netlink calls will detach the prog. > >> This other app can attach to the same spot a different prog and now > >> bpf_link__destroy will be detaching somebody else prog. > >> > >> > > So I would like to propose to take this patch set a step further from > >> > > what Daniel said: > >> > > int bpf_tc_attach(prog_fd, ifindex, {INGRESS,EGRESS}): > >> > > and make this proposed api to return FD. > >> > > To detach from tc ingress/egress just close(fd). > >> > > >> > You mean adding an fd-based TC API to the kernel? > >> > >> yes. > > > > I'm totally for bpf_link-based TC attachment. > > > > But I think *also* having "legacy" netlink-based APIs will allow > > applications to handle older kernels in a much nicer way without extra > > dependency on iproute2. We have a similar situation with kprobe, where > > currently libbpf only supports "modern" fd-based attachment, but users > > periodically ask questions and struggle to figure out issues on older > > kernels that don't support new APIs. > > +1; I am OK with adding a new bpf_link-based way to attach TC programs, > but we still need to support the netlink API in libbpf. > > > So I think we'd have to support legacy TC APIs, but I agree with > > Alexei and Daniel that we should keep it to the simplest and most > > straightforward API of supporting direction-action attachments and > > setting up qdisc transparently (if I'm getting all the terminology > > right, after reading Quentin's blog post). That coincidentally should > > probably match how bpf_link-based TC API will look like, so all that > > can be abstracted behind a single bpf_link__attach_tc() API as well, > > right? That's the plan for dealing with kprobe right now, btw. Libbpf > > will detect the best available API and transparently fall back (maybe > > with some warning for awareness, due to inherent downsides of legacy > > APIs: no auto-cleanup being the most prominent one). > > Yup, SGTM: Expose both in the low-level API (in bpf.c), and make the > high-level API auto-detect. That way users can also still use the > netlink attach function if they don't want the fd-based auto-close > behaviour of bpf_link. So I thought a bit more about this, and it feels like the right move would be to expose only higher-level TC BPF API behind bpf_link. It will keep the API complexity and amount of APIs that libbpf will have to support to the minimum, and will keep the API itself simple: direct-attach with the minimum amount of input arguments. By not exposing low-level APIs we also table the whole bpf_tc_cls_attach_id design discussion, as we now can keep as much info as needed inside bpf_link_tc (which will embed bpf_link internally as well) to support detachment and possibly some additional querying, if needed. I think that's the best and least controversial step forward for getting this API into libbpf. > > -Toke >