Re: [PATCH RFC bpf-next 0/7] Add bpf_link based TC-BPF API

Kumar Kartikeya Dwivedi <memxor@xxxxxxxxx> · Mon, 7 Jun 2021 09:07:24 +0530

On Mon, Jun 07, 2021 at 05:07:28AM IST, Cong Wang wrote:
> On Fri, May 28, 2021 at 1:00 PM Kumar Kartikeya Dwivedi
> <memxor@xxxxxxxxx> wrote:
> >
> > This is the first RFC version.
> >
> > This adds a bpf_link path to create TC filters tied to cls_bpf classifier, and
> > introduces fd based ownership for such TC filters. Netlink cannot delete or
> > replace such filters, but the bpf_link is severed on indirect destruction of the
> > filter (backing qdisc being deleted, or chain being flushed, etc.). To ensure
> > that filters remain attached beyond process lifetime, the usual bpf_link fd
> > pinning approach can be used.
>
> I have some troubles understanding this. So... why TC filter is so special
> here that it deserves such a special treatment?
>

So the motivation behind this was automatic cleanup of filters installed by some
program. Usually from the userspace side you need a bunch of things (handle,
priority, protocol, chain_index, etc.) to be able to delete a filter without
stepping on others' toes. Also, there is no gurantee that filter hasn't been
replaced, so you need to check some other way (either tag or prog_id, but these
are also not perfect).

bpf_link provides isolation from netlink and fd-based lifetime management. As
for why it needs special treatment (by which I guess you mean why it _creates_
an object instead of simply attaching to one, see below):

> The reason why I ask is that none of other bpf links actually create any
> object, they merely attach bpf program to an existing object. For example,
> netns bpf_link does not create an netns, cgroup bpf_link does not create
> a cgroup either. So, why TC filter is so lucky to be the first one requires
> creating an object?
>

They are created behind the scenes, but are also fairly isolated from netlink
(i.e.  can only be introspected, so not hidden in that sense, but are
effectively locked for replace/delete).

The problem would be (of not creating a filter during attach) is that a typical
'attach point' for TC exists in form of tcf_proto. If we take priority (protocol
is fixed) out of the equation, it becomes possible to attach just a single BPF
prog, but that seems like a needless limitation when TC already supports list of
filters at each 'attach point'.

My point is that the created object (the tcf_proto under the 'chain' object) is
the attach point, and since there can be so many, keeping them around at all
times doesn't make sense, so the refcounted attach locations are created as
needed.  Both netlink and bpf_link owned filters can be attached under the same
location, with different ownership story in userspace.

> Is it because there is no fd associated with any TC object?  Or what?
> TC object, like all other netlink stuffs, is not fs based, hence does not
> have an fd. Or maybe you don't need an fd at all? Since at least xdp
> bpf_link is associated with a netdev which does not have an fd either.
>
> >
> > The individual patches contain more details and comments, but the overall kernel
> > API and libbpf helper mirrors the semantics of the netlink based TC-BPF API
> > merged recently. This means that we start by always setting direct action mode,
> > protocol to ETH_P_ALL, chain_index as 0, etc. If there is a need for more
> > options in the future, they can be easily exposed through the bpf_link API in
> > the future.
>
> As you already see, this fits really oddly into TC infrastructure, because
> TC qdisc/filter/action are a whole subsystem, here you are trying to punch
> a hole in the middle. ;) This usually indicates that we are going in a wrong
> direction, maybe your case is an exception, but I can't find anything to justify
> it in your cover letter.
>

I don't see why I'm punching a hole. The qdisc, chain, protocol, priority is the
'attach location', handle is just an ID, maybe we can skip all this and just
create a static hook for attaching single BPF program that doesn't require
creating a filter, but someday someone will have to reimplement chaining of
programs again (like libxdp does).

> Even if you really want to go down this path (I still double), you probably
> want to explore whether there is any generic way to associate a TC object
> with an fd, because we have TC bpf action and we will have TC bpf qdisc
> too, I don't see any bpf_cls is more special than them.
>

I think TC bpf actions are not relevant going forward (due to cls_bpf's direct
action mode), but I could be wrong. I say so because even a proposed API to
attach these from libbpf was dropped because arguably cls_bpf does it better,
and people shouldn't be using integrated actions going forward.

TC bpf qdisc might be, but that can be a different attach type (say BPF_SCHED),
which if exposed through bpf_link will again have its attach point to be the
target_ifindex, not some fd, and it would still be possible to use this API to
attach to a eBPF qdisc.

What do you suggest? I am open to reworking this in a different way if there are
any better ideas.

> Thanks.

--
Kartikeya