Re: [RFC PATCH v7 1/8] net_sched: Introduce eBPF based Qdisc

Amery Hung <ameryhung@xxxxxxxxx> · Fri, 26 Jan 2024 17:17:26 -0800

On Thu, Jan 25, 2024 at 6:22 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote:
>
> On 1/23/24 9:22 PM, Amery Hung wrote:
> >> I looked at the high level of the patchset. The major ops that it wants to be
> >> programmable in bpf is the ".enqueue" and ".dequeue" (+ ".init" and ".reset" in
> >> patch 4 and patch 5).
> >>
> >> This patch adds a new prog type BPF_PROG_TYPE_QDISC, four attach types (each for
> >> ".enqueue", ".dequeue", ".init", and ".reset"), and a new "bpf_qdisc_ctx" in the
> >> uapi. It is no long an acceptable way to add new bpf extension.
> >>
> >> Can the ".enqueue", ".dequeue", ".init", and ".reset" be completely implemented
> >> in bpf (with the help of new kfuncs if needed)? Then a struct_ops for Qdisc_ops
> >> can be created. The bpf Qdisc_ops can be loaded through the existing struct_ops api.
> >>
> > Partially. If using struct_ops, I think we'll need another structure
> > like the following in bpf qdisc to be implemented with struct_ops bpf:
> >
> > struct bpf_qdisc_ops {
> >      int (*enqueue) (struct sk_buff *skb)
> >      void (*dequeue) (void)
> >      void (*init) (void)
> >      void (*reset) (void)
> > };
> >
> > Then, Qdisc_ops will wrap around them to handle things that cannot be
> > implemented with bpf (e.g., sch_tree_lock, returning a skb ptr).
>
> We can see how those limitations (calling sch_tree_lock() and returning a ptr)
> can be addressed in bpf. This will also help other similar use cases.
>

For kptr, I wonder if we can support the following semantics in bpf if
they make sense:
1. Passing a referenced kptr into a bpf program, which will also need
to be released, or exchanged into maps or allocated objects.
2. Returning a kptr from a program and treating it as releasing the reference.

> Other than sch_tree_lock and returning a ptr from a bpf prog. What else do you
> see that blocks directly implementing the enqueue/dequeue/init/reset in the
> struct Qdisc_ops?
>

Not much! We can deal with sch_tree_lock later since
enqueue/dequeue/init/reset are unlikely to use it.

> Have you thought above ".priv_size"? It is now fixed to sizeof(struct
> bpf_sched_data). It should be useful to allow the bpf prog to store its own data
> there?
>

Maybe we can let bpf qdiscs store statistics here and make it work
with netlink. I haven't explored much in how bpf qdiscs record and
share statistics with user space.

> >
> >> If other ops (like ".dump", ".dump_stats"...) do not have good use case to be
> >> programmable in bpf, it can stay with the kernel implementation for now and only
> >> allows the userspace to load the a bpf Qdisc_ops with .equeue/dequeue/init/reset
> >> implemented.
> >>
> >> You mentioned in the cover letter that:
> >> "Current struct_ops attachment model does not seem to support replacing only
> >> functions of a specific instance of a module, but I might be wrong."
> >>
> >> I assumed you meant allow bpf to replace only "some" ops of the Qdisc_ops? Yes,
> >> it can be done through the struct_ops's ".init_member". Take a look at
> >> bpf_tcp_ca_init_member. The kernel can assign the kernel implementation for
> >> ".dump" (for example) when loading the bpf Qdisc_ops.
> >>
> > I have no problem with partially replacing a struct, which like you
> > mentioned has been demonstrated by congestion control or sched_ext.
> > What I am not sure about is the ability to create multiple bpf qdiscs,
> > where each has different ".enqueue", ".dequeue", and so on. I like the
> > struct_ops approach and would love to try it if struct_ops support
> > this.
>
> The need for allowing different ".enqueue/.dequeue/..." bpf
> (BPF_PROG_TYPE_QDISC) programs loaded into different qdisc instances is because
> there is only one ".id == bpf" Qdisc_ops native kernel implementation which is
> then because of the limitation you mentioned above?
>
> Am I understanding your reason correctly on why it requires to load different
> bpf prog for different qdisc instances?
>
> If the ".enqueue/.dequeue/..." in the "struct Qdisc_ops" can be directly
> implemented in bpf prog itself, it can just load another bpf struct_ops which
> has a different ".enqueue/.dequeue/..." implementation:
>
> #> bpftool struct_ops register bpf_simple_fq_v1.bpf.o
> #> bpftool struct_ops register bpf_simple_fq_v2.bpf.o
> #> bpftool struct_ops register bpf_simple_fq_xyz.bpf.o
>
>  From reading the test bpf prog, I think the set is on a good footing. Instead
> of working around the limitation by wrapping the bpf prog in a predefined
> "struct Qdisc_ops sch_bpf_qdisc_ops", lets first understand what is missing in
> bpf and see how we could address them.
>

Thank you so much for the clarification. I had the wrong impression
since I was thinking about using a structure in the bpf qdisc for
struct_ops. It makes sense to try making "struct Qdisc_ops" work with
struct_ops. I will send the next patch set with struct_ops.

Thanks,
Amery

>