Re: [PATCH RFC bpf-next 0/7] Add bpf_link based TC-BPF API

Jamal Hadi Salim <jhs@xxxxxxxxxxxx> · Sun, 13 Jun 2021 16:27:16 -0400

Hi,

Sorry - but i havent kept up with the discussion so some of this
and it is possible I may be misunderstanding some things you mention
in passing below (example that you only support da mode or the classid 
being able to be handled differently etc).
XDP may not be the best model to follow since some things that exist
in the tc architecture(example ability to have multi-programs)
seem to be plumbed in later (mostly because the original design intent
for XDP was to make it simple and then deployment follow and more
features get added)

Integrating tc into libbpf is a definete bonus that allows with a
unified programmatic interface and a singular loading mechanism - but
it wasnt clear why we loose some features that tc provides; we have
them today with current tc based loading scheme. I certainly use the
non-da scheme because over time it became clear that complex
programs(not necessarily large code size) are a challenge with ebpf
and using existing tc actions is valuable.
Also, multiple priorities are  important for the same reason - you
can work around them in your singular ebpf program but sooner than
later you will run out "tricks".

We do have this monthly tc meetup every second monday of the month.
Unfortunately it is short notice since the next one is monday 12pm
eastern time. Maybe you can show up and a high bandwidth discussion
(aka voice) would help?

cheers,
jamal

On 2021-06-12 10:53 p.m., Kumar Kartikeya Dwivedi wrote:
On Fri, Jun 11, 2021 at 07:30:49AM IST, Cong Wang wrote:
On Tue, Jun 8, 2021 at 12:20 AM Kumar Kartikeya Dwivedi
<memxor@xxxxxxxxx> wrote:

So we're not really creating a qdisc here, we're just tying the filter (which in
the current semantics exists only while attached) to the bpf_link. The filter is
the attachment, so tying its lifetime to bpf_link makes sense. When you destroy
the bpf_link, the filter goes away too, which means classification at that
hook (parent/class) in the qdisc stops working. This is why creating the filter
from the bpf_link made sense to me.

I see why you are creating TC filters now, because you are trying to
force the lifetime of a bpf target to align with the bpf program itself.
The deeper reason seems to be that a cls_bpf filter looks so small
that it appears to you that it has nothing but a bpf_prog, right?

Yes, pretty much.

I offer two different views here:

1. If you view a TC filter as an instance as a netdev/qdisc/action, they
are no different from this perspective. Maybe the fact that a TC filter
resides in a qdisc makes a slight difference here, but like I mentioned, it
actually makes sense to let TC filters be standalone, qdisc's just have to
bind with them, like how we bind TC filters with standalone TC actions.

You propose something different below IIUC, but I explained why I'm wary of
these unbound filters. They seem to add a step to classifier setup for no real
benefit to the user (except keeping track of one more object and cleaning it
up with the link when done).

I understand that the filter is very much an object of its own and why keeping
them unbound makes sense, but for the user there is no real benefit of this
scheme (some things like classid attribute are contextual in that they make
sense to be set based on what parent we're attaching to).

These are all updated independently, despite some of them residing in
another. There should not be an exceptional TC filter which can not
be updated via `tc filter` command.

I see, but I'm mirroring what was done for XDP bpf_link.

Besides, flush still works, it's only that manipulating a filter managed by
bpf_link is not allowed, which sounds reasonable to me, given we're bringing
new ownership semantics here which didn't exist before with netlink, so it
doesn't make sense to allow netlink to simply invalidate the filter installed by
some other program.

You wouldn't do something like that for a cooperating setup, we're just
enforcing that using -EPERM (bpf_link is not allowed to replace netlink
installed filters either, so it goes both ways).

2. For cls_bpf specifically, it is also an instance, like all other TC filters.
You can update it in the same way: tc filter change [...] The only difference
is a bpf program can attach to such an instance. So you can view the bpf
program attached to cls_bpf as a property of it. From this point of view,
there is no difference with XDP to netdev, where an XDP program
attached to a netdev is also a property of netdev. A netdev can still
function without XDP. Same for cls_bpf, it can be just a nop returns
TC_ACT_SHOT (or whatever) if no ppf program is attached. Thus,
the lifetime of a bpf program can be separated from the target it
attaches too, like all other bpf_link targets. bpf_link is just a
supplement to `tc filter change cls_bpf`, not to replace it.

So this is different now, as in the filter is attached as usual but bpf_link
represents attachment of bpf prog to the filter itself, not the filter to the
qdisc.

To me it seems apart from not having to create filter, this would pretty much be
equivalent to where I hook the bpf_link right now?

TBF, this split doesn't really seem to be bringing anything to the table (except
maybe preserving netlink as the only way to manipulate filter properties) and
keeping filters as separate objects. I can understand your position but for the
user it's just more and more objects to keep track of with no proper
ownership/cleanup semantics.

Though considering it for cls_bpf in particular, there are mainly three things
you would want to tc filter change:

* Integrated actions
   These are not allowed anyway, we force enable direct action mode, and I don't
   plan on opening up actions for this if its gets accepted. Anything missing
   we'll try to make it work in eBPF (act_ct etc.)

* classid
   cls_bpf has a good alternative of instead manipulating __sk_buff::tc_classid

* skip_hw/skip_sw
   Not supported for now, but can be done using flags in BPF_LINK_UPDATE

* BPF program
   Already works using BPF_LINK_UPDATE

So bpf_link isn't really prohibitive in any way.

Doing it your way also complicates cleanup of the filter (in case we don't want
to leave it attached), because it is hard to know who closes the link_fd last.
Closing it earlier would break the link for existing users, not doing it would
leave around unused object (which can accumulate if we use auto allocation of
filter priority). Counting existing links is racy.

This is better done in the kernel than worked around in userspace, as part of
attachment.

This is actually simpler, you do not need to worry about whether
netdev is destroyed when you detach the XDP bpf_link anyway,
same for cls_bpf filters. Likewise, TC filters don't need to worry
about bpf_links associated.

Thanks.

--
Kartikeya