On 1/24/24 1:09 PM, Jamal Hadi Salim wrote:
On Tue, Jan 23, 2024 at 4:13 PM Stanislav Fomichev <sdf@xxxxxxxxxx> wrote:
On 01/17, Amery Hung wrote:
Hi,
I am continuing the work of ebpf-based Qdisc based on Cong’s previous
RFC. The followings are some use cases of eBPF Qdisc:
1. Allow customizing Qdiscs in an easier way. So that people don't
have to write a complete Qdisc kernel module just to experiment
some new queuing theory.
2. Solve EDT's problem. EDT calcuates the "tokens" in clsact which
is before enqueue, it is impossible to adjust those "tokens" after
packets get dropped in enqueue. With eBPF Qdisc, it is easy to
be solved with a shared map between clsact and sch_bpf.
3. Replace qevents, as now the user gains much more control over the
skb and queues.
4. Provide a new way to reuse TC filters. Currently TC relies on filter
chain and block to reuse the TC filters, but they are too complicated
to understand. With eBPF helper bpf_skb_tc_classify(), we can invoke
TC filters on _any_ Qdisc (even on a different netdev) to do the
classification.
5. Potentially pave a way for ingress to queue packets, although
current implementation is still only for egress.
I’ve combed through previous comments and appreciated the feedbacks.
Some major changes in this RFC is the use of kptr to skb to maintain
the validility of skb during its lifetime in the Qdisc, dropping rbtree
maps, and the inclusion of two examples.
Some questions for discussion:
1. We now pass a trusted kptr of sk_buff to the program instead of
__sk_buff. This makes most helpers using __sk_buff incompatible
with eBPF qdisc. An alternative is to still use __sk_buff in the
context and use bpf_cast_to_kern_ctx() to acquire the kptr. However,
this can only be applied to enqueue program, since in dequeue program
skbs do not come from ctx but kptrs exchanged out of maps (i.e., there
is no __sk_buff). Any suggestion for making skb kptr and helper
functions compatible?
2. The current patchset uses netlink. Do we also want to use bpf_link
for attachment?
[..]
3. People have suggested struct_ops. We chose not to use struct_ops since
users might want to create multiple bpf qdiscs with different
implementations. Current struct_ops attachment model does not seem
to support replacing only functions of a specific instance of a module,
but I might be wrong.
I still feel like it deserves at leasta try. Maybe we can find some potential
path where struct_ops can allow different implementations (Martin probably
has some ideas about that). I looked at the bpf qdisc itself and it doesn't
really have anything complicated (besides trying to play nicely with other
tc classes/actions, but I'm not sure how relevant that is).
Are you suggesting that it is a nuisance to integrate with the
existing infra? I would consider it being a lot more than "trying to
play nicely". Besides, it's a kfunc and people will not be forced to
use it.
What's the use case? If you already go that route to implement your own
qdisc, why is there a need to take the performane hit and go all the
way into old style cls/act infra when it can be done in a more straight
forward way natively? For the vast majority of cases this will be some
very lightweight classification anyway (if not outsourced to tc egress
given the lock). If there is a concrete production need, it could be
added, otherwise if there is no immediate use case which cannot be solved
otherwise I would not add unnecessary kfuncs.
Cheers,
Daniel