On 2021-09-03 10:44 a.m., Toke Høiland-Jørgensen wrote:
Martin KaFai Lau <kafai@xxxxxx> writes:
On Fri, Sep 03, 2021 at 12:27:52AM +0200, Toke Høiland-Jørgensen wrote:
The question is if it's useful to provide the full struct_ops for
qdiscs? Having it would allow a BPF program to implement that interface
towards userspace (things like statistics, classes etc), but the
question is if anyone is going to bother with that given the wealth of
BPF-specific introspection tools already available?
Instead of bpftool can only introspect bpf qdisc and the existing tc
can only introspect kernel qdisc, it will be nice to have bpf
qdisc work as other qdisc and showing details together with others
in tc. e.g. a bpf qdisc export its data/stats with its btf-id
to tc and have tc print it out in a generic way?
I'm not opposed to the idea, certainly. I just wonder if people who go
to the trouble of writing a custom qdisc in BPF will feel it's worth it
to do the extra work to make this available via a second API. We could
certainly encourage it, and some things are easy (drop and pkt counters,
etc), but other things (like class stats) will depend on the semantics
of the qdisc being implemented, so will require extra work from the BPF
qdisc developer...
The idea of using btf to overcome the domain difference is _very_
appealing but sounds like a lot of work? Havent delved enough
into btf - but wondering if the same could be stated for filters
and actions...Note:
Aside from current existing tooling being well understood,
challenges you will be faced with is reinventing all the
infrastructure that tc qdiscs have taken care of over the years,
example:
the proper integrations with softirqs and multiprocessor protections,
irqs, timers etc which take care of smooth triggering of
enqueue/dequeue, taking care of defering things when the target
device/hw is busy, hierarchies, etc, etc;
not saying it is the most perfect or performant but it is one of
those 'day 3' deployments i.e a lot of corner cases taken care of.
I noticed you mentioned some of those things in one of your emails.
For this reason - Cong's approach looks appealing because it
reuses said infra. Main thing that needs to have extensibility is
the de/enqueue ops as ebpf progs. Allowing enq/deq to be ebpf specific
sounds like will allow one scheme that works for both tc and XDP
(with enq/deq taking care of the buffer contextual differences).
I admit XDP is a little harder than plain tc....
cheers,
jamal