Re: [PATCH RFC bpf-next 0/7] Add bpf_link based TC-BPF API

Daniel Borkmann <daniel@xxxxxxxxxxxxx> · Wed, 16 Jun 2021 18:00:09 +0200

On 6/16/21 5:32 PM, Kumar Kartikeya Dwivedi wrote:
On Wed, Jun 16, 2021 at 08:10:55PM IST, Jamal Hadi Salim wrote:
On 2021-06-15 7:07 p.m., Daniel Borkmann wrote:
On 6/13/21 11:10 PM, Jamal Hadi Salim wrote:

[..]

I look at it from the perspective that if i can run something with
existing tc loading mechanism then i should be able to do the same
with the new (libbpf) scheme.

The intention is not to provide a full-blown tc library (that could be
subject to a
libtc or such), but rather to only have libbpf abstract the tc related
API that is
most /relevant/ for BPF program development and /efficient/ in terms of
execution in
fast-path while at the same time providing a good user experience from
the API itself.

That is, simple to use and straight forward to explain to folks with
otherwise zero
experience of tc. The current implementation does all that, and from
experience with
large BPF programs managed via cls_bpf that is all that is actually
needed from tc
layer perspective. The ability to have multi programs (incl. priorities)
is in the
existing libbpf API as well.

Which is a fair statement, but if you take away things that work fine
with current iproute2 loading I have no motivation to migrate at all.
Its like that saying of "throwing out the baby with the bathwater".
I want my baby.

In particular, here's a list from Kartikeya's implementation:

1) Direct action mode only

(More below.)

2) Protocol ETH_P_ALL only

The issue I see with this one is that it's not very valuable or useful from a BPF
point of view. Meaning, this kind of check can and typically is implemented from
BPF program anyway. For example, when you have direct packet access initially
parsing the eth header anyway (and from there having logic for the various eth
protos).

That protocol option is maybe more useful when you have classic tc with cls+act
style pipeline where you want a quick skip of classifiers to avoid reparsing the
packet. Given you can do everything inside the BPF program already it adds more
confusion than value for a simple libbpf [tc/BPF] API.

3) Only at chain 0
4) No block support

Block is supported, you just need to set TCM_IFINDEX_MAGIC_BLOCK as ifindex and
parent as the block index. There isn't anything more to it than that from libbpf
side (just specify BPF_TC_CUSTOM enum).

What I meant was that hook_create doesn't support specifying the ingress/egress
block when creating clsact, but that typically isn't a problem because qdiscs
for shared blocks would be set up together prior to the attachment anyway.

I think he said priority is supported but was also originally on that
list.
When we discussed at the meetup it didnt seem these cost anything
in terms of code complexity or usability of the API.

1) We use non-DA mode, so i cant live without that (and frankly ebpf
has challenges adding complex code blocks).

Could you elaborate on that or provide code examples? Since introduction of the
direct action mode I've never used anything else again, and we do have complex
BPF code blocks that we need to handle as well. Would be good if you could provide
more details on things you ran into, maybe they can be solved?

2) We also use different protocols when i need to
(yes, you can do the filtering in the bpf code - but why impose that
if the cost of adding it is simple? and of course it is cheaper to do
the check outside of ebpf)
3) We use chains outside of zero

4) So far we dont use block support but certainly my recent experiences
in a deployment shows that we need to group netdevices more often than
i thought was necessary. So if i could express one map shared by
multiple netdevices it should cut down the user space complexity.

Thanks,
Daniel