On Thu, Jan 25, 2024 at 6:22 PM Martin KaFai Lau <martin.lau@xxxxxxxxx> wrote: > > On 1/23/24 9:22 PM, Amery Hung wrote: > >> I looked at the high level of the patchset. The major ops that it wants to be > >> programmable in bpf is the ".enqueue" and ".dequeue" (+ ".init" and ".reset" in > >> patch 4 and patch 5). > >> > >> This patch adds a new prog type BPF_PROG_TYPE_QDISC, four attach types (each for > >> ".enqueue", ".dequeue", ".init", and ".reset"), and a new "bpf_qdisc_ctx" in the > >> uapi. It is no long an acceptable way to add new bpf extension. > >> > >> Can the ".enqueue", ".dequeue", ".init", and ".reset" be completely implemented > >> in bpf (with the help of new kfuncs if needed)? Then a struct_ops for Qdisc_ops > >> can be created. The bpf Qdisc_ops can be loaded through the existing struct_ops api. > >> > > Partially. If using struct_ops, I think we'll need another structure > > like the following in bpf qdisc to be implemented with struct_ops bpf: > > > > struct bpf_qdisc_ops { > > int (*enqueue) (struct sk_buff *skb) > > void (*dequeue) (void) > > void (*init) (void) > > void (*reset) (void) > > }; > > > > Then, Qdisc_ops will wrap around them to handle things that cannot be > > implemented with bpf (e.g., sch_tree_lock, returning a skb ptr). > > We can see how those limitations (calling sch_tree_lock() and returning a ptr) > can be addressed in bpf. This will also help other similar use cases. > For kptr, I wonder if we can support the following semantics in bpf if they make sense: 1. Passing a referenced kptr into a bpf program, which will also need to be released, or exchanged into maps or allocated objects. 2. Returning a kptr from a program and treating it as releasing the reference. > Other than sch_tree_lock and returning a ptr from a bpf prog. What else do you > see that blocks directly implementing the enqueue/dequeue/init/reset in the > struct Qdisc_ops? > Not much! We can deal with sch_tree_lock later since enqueue/dequeue/init/reset are unlikely to use it. > Have you thought above ".priv_size"? It is now fixed to sizeof(struct > bpf_sched_data). It should be useful to allow the bpf prog to store its own data > there? > Maybe we can let bpf qdiscs store statistics here and make it work with netlink. I haven't explored much in how bpf qdiscs record and share statistics with user space. > > > >> If other ops (like ".dump", ".dump_stats"...) do not have good use case to be > >> programmable in bpf, it can stay with the kernel implementation for now and only > >> allows the userspace to load the a bpf Qdisc_ops with .equeue/dequeue/init/reset > >> implemented. > >> > >> You mentioned in the cover letter that: > >> "Current struct_ops attachment model does not seem to support replacing only > >> functions of a specific instance of a module, but I might be wrong." > >> > >> I assumed you meant allow bpf to replace only "some" ops of the Qdisc_ops? Yes, > >> it can be done through the struct_ops's ".init_member". Take a look at > >> bpf_tcp_ca_init_member. The kernel can assign the kernel implementation for > >> ".dump" (for example) when loading the bpf Qdisc_ops. > >> > > I have no problem with partially replacing a struct, which like you > > mentioned has been demonstrated by congestion control or sched_ext. > > What I am not sure about is the ability to create multiple bpf qdiscs, > > where each has different ".enqueue", ".dequeue", and so on. I like the > > struct_ops approach and would love to try it if struct_ops support > > this. > > The need for allowing different ".enqueue/.dequeue/..." bpf > (BPF_PROG_TYPE_QDISC) programs loaded into different qdisc instances is because > there is only one ".id == bpf" Qdisc_ops native kernel implementation which is > then because of the limitation you mentioned above? > > Am I understanding your reason correctly on why it requires to load different > bpf prog for different qdisc instances? > > If the ".enqueue/.dequeue/..." in the "struct Qdisc_ops" can be directly > implemented in bpf prog itself, it can just load another bpf struct_ops which > has a different ".enqueue/.dequeue/..." implementation: > > #> bpftool struct_ops register bpf_simple_fq_v1.bpf.o > #> bpftool struct_ops register bpf_simple_fq_v2.bpf.o > #> bpftool struct_ops register bpf_simple_fq_xyz.bpf.o > > From reading the test bpf prog, I think the set is on a good footing. Instead > of working around the limitation by wrapping the bpf prog in a predefined > "struct Qdisc_ops sch_bpf_qdisc_ops", lets first understand what is missing in > bpf and see how we could address them. > Thank you so much for the clarification. I had the wrong impression since I was thinking about using a structure in the bpf qdisc for struct_ops. It makes sense to try making "struct Qdisc_ops" work with struct_ops. I will send the next patch set with struct_ops. Thanks, Amery >