Re: [PATCH RFC bpf-next 0/7] Add bpf_link based TC-BPF API

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2021-06-18 6:42 p.m., Daniel Borkmann wrote:
On 6/18/21 1:40 PM, Jamal Hadi Salim wrote:

[..]
From a user interface PoV it's odd since you need to go and parse that anyway, at least the programs typically start out with a switch/case on either reading the skb->protocol or getting it via eth->h_proto. But then once you extend that same program to also cover IPv6, you don't need to do anything with the ETH_P_ALL from the loader application, but now you'd also need to additionally remember to downgrade ETH_P_IP to ETH_P_ALL and rebuild the loader to get v6 traffic. But even if you were to split things in the main/entry program to separate v4/v6 processing into two different ones, I expect this to be faster via tail calls (given direct absolute jump) instead of walking a list of tcf_proto objects, comparing the
tp->protocol and going into a different cls_bpf instance.


Good point on being more future proof with ETH_P_ALL.
Note: In our case we were only interested in ipv4 and i dont see that
changing for the specific prog we have. From a compute perspective all
i am saving by not using ETH_P_ALL is one if statement (checking if
proto is ipv4). If you feel strongly about it we can change our code.
My worry now is if we used this approach then likely someone else in the wild used something similar.

I think it boils down again to: if it doesnt confuse the API or add
extra complexity why not allow it and default to ETH_P_ALL?

On your comment that a bpf based proto comparison being faster - the
issue is that the tp proto always happens regardless and ebpf, depending
on your program, may not fit all your code. Example i may actually
decide to have a program for v6 and v4 separately if i wanted
to with current mechanism - at different tc ruleset prios just
so as to work around code/complexity issues.

BTW: tail call limit of 32 provides an upper bound which affects
depth of (generic) parsing.
Does it make sense to allow (maybe on a per-boot) increasing the size?
The fact things run on the stack may be restricting.


It may be more tricky but not impossible either, in recent years some (imho) very interesting and exciting use cases have been implemented and talked about e.g. [0-2], and with the recent linker work there could also be a [e.g. in-kernel] collection with library code that can be pulled in by others aside from using them as BPF selftests as one option. The gain you have with the flexibility [as you know] is that it allows easy integration/orchestration into user space applications and thus suitable for more dynamic envs as with old-style actions. The issue I have with the latter is that they're not scalable enough from a SW datapath / tc fast-path perspective given you then need to fallback to old-style list processing of cls+act combinations which is also not covered / in scope for the libbpf API in terms of their setup, and additionally not all of the BPF features can be used this way either, so it'll be very hard for users to debug why their BPF programs don't work as they're expected to.

But also aside from those blockers, the case with this clean slate tc BPF API is that we have a unique chance to overcome the cmdline usability struggles, and make it as
straight forward as possible for new generation of users.

   [0] https://linuxplumbersconf.org/event/7/contributions/677/
   [1] https://linuxplumbersconf.org/event/2/contributions/121/
  [2] https://netdevconf.info/0x14/session.html?talk-replacing-HTB-with-EDT-and-BPF

I took a quick glance at the refs.

IIUC, your message is "do more with less" i.e restrict choices now
so we can focus on optimizing for speed. Here's my experience.
We have two pragmatic challenges:

1) In a deployment, like some enterprise class data centers, we are
often limited by the kernel and often even the distro you are on. You
cant just upgrade to the latest and greatest without risking voiding
the distro vendors support contract. Big shops with a lot of geniuses
like FB and Google dont have these problems of course - but the majority
out there do.

So even our little program must use supported interfaces (ex: You cant
expect support on RH8.3 for an XDP issue without using the supplied XDP lib) to be accepted.

So building in support to use existing infra is useful

2) challenges with ebpf code space and code complexity: Depending
on the complexity, a program with less than 4K instructions may be
rejected by the verifier. IOW, I just cant add all the features
i need _even if i wanted to_.

For this reason working cooperatively with other existing kernel
and user infra makes sense (Ref [2] is doing that for example).
You dont want to rewrite the kernel using ebpf. Extending the kernel
with ebpf makes sense. And of course I dont want to loose performance
but there may be a trade-off sometimes where a little loss in performance is justified for gain of a feature makes sense
(the non-da example applies).

Perhaps adding more helpers to interface to the actions and classifiers
is one way forward.

cheers,
jamal

PS: I didnt understand the kernel linker point with BPF selftests.
Pointer?



[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux