Re: [RFC PATCH v7 1/8] net_sched: Introduce eBPF based Qdisc

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/23/24 9:22 PM, Amery Hung wrote:
I looked at the high level of the patchset. The major ops that it wants to be
programmable in bpf is the ".enqueue" and ".dequeue" (+ ".init" and ".reset" in
patch 4 and patch 5).

This patch adds a new prog type BPF_PROG_TYPE_QDISC, four attach types (each for
".enqueue", ".dequeue", ".init", and ".reset"), and a new "bpf_qdisc_ctx" in the
uapi. It is no long an acceptable way to add new bpf extension.

Can the ".enqueue", ".dequeue", ".init", and ".reset" be completely implemented
in bpf (with the help of new kfuncs if needed)? Then a struct_ops for Qdisc_ops
can be created. The bpf Qdisc_ops can be loaded through the existing struct_ops api.

Partially. If using struct_ops, I think we'll need another structure
like the following in bpf qdisc to be implemented with struct_ops bpf:

struct bpf_qdisc_ops {
     int (*enqueue) (struct sk_buff *skb)
     void (*dequeue) (void)
     void (*init) (void)
     void (*reset) (void)
};

Then, Qdisc_ops will wrap around them to handle things that cannot be
implemented with bpf (e.g., sch_tree_lock, returning a skb ptr).

We can see how those limitations (calling sch_tree_lock() and returning a ptr) can be addressed in bpf. This will also help other similar use cases.

Other than sch_tree_lock and returning a ptr from a bpf prog. What else do you see that blocks directly implementing the enqueue/dequeue/init/reset in the struct Qdisc_ops?

Have you thought above ".priv_size"? It is now fixed to sizeof(struct bpf_sched_data). It should be useful to allow the bpf prog to store its own data there?


If other ops (like ".dump", ".dump_stats"...) do not have good use case to be
programmable in bpf, it can stay with the kernel implementation for now and only
allows the userspace to load the a bpf Qdisc_ops with .equeue/dequeue/init/reset
implemented.

You mentioned in the cover letter that:
"Current struct_ops attachment model does not seem to support replacing only
functions of a specific instance of a module, but I might be wrong."

I assumed you meant allow bpf to replace only "some" ops of the Qdisc_ops? Yes,
it can be done through the struct_ops's ".init_member". Take a look at
bpf_tcp_ca_init_member. The kernel can assign the kernel implementation for
".dump" (for example) when loading the bpf Qdisc_ops.

I have no problem with partially replacing a struct, which like you
mentioned has been demonstrated by congestion control or sched_ext.
What I am not sure about is the ability to create multiple bpf qdiscs,
where each has different ".enqueue", ".dequeue", and so on. I like the
struct_ops approach and would love to try it if struct_ops support
this.

The need for allowing different ".enqueue/.dequeue/..." bpf (BPF_PROG_TYPE_QDISC) programs loaded into different qdisc instances is because there is only one ".id == bpf" Qdisc_ops native kernel implementation which is then because of the limitation you mentioned above?

Am I understanding your reason correctly on why it requires to load different bpf prog for different qdisc instances?

If the ".enqueue/.dequeue/..." in the "struct Qdisc_ops" can be directly implemented in bpf prog itself, it can just load another bpf struct_ops which has a different ".enqueue/.dequeue/..." implementation:

#> bpftool struct_ops register bpf_simple_fq_v1.bpf.o
#> bpftool struct_ops register bpf_simple_fq_v2.bpf.o
#> bpftool struct_ops register bpf_simple_fq_xyz.bpf.o

From reading the test bpf prog, I think the set is on a good footing. Instead of working around the limitation by wrapping the bpf prog in a predefined "struct Qdisc_ops sch_bpf_qdisc_ops", lets first understand what is missing in bpf and see how we could address them.






[Index of Archives]     [Linux Samsung SoC]     [Linux Rockchip SoC]     [Linux Actions SoC]     [Linux for Synopsys ARC Processors]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]


  Powered by Linux