Amery Hung <ameryhung@xxxxxxxxx> writes: > Hi all, > > This patchset aims to support implementing qdisc using bpf struct_ops. > This version takes a step back and only implements the minimum support > for bpf qdisc. 1) support of adding skb to bpf_list and bpf_rbtree > directly and 2) classful qdisc are deferred to future patchsets. > > * Overview * > > This series supports implementing qdisc using bpf struct_ops. bpf qdisc > aims to be a flexible and easy-to-use infrastructure that allows users to > quickly experiment with different scheduling algorithms/policies. It only > requires users to implement core qdisc logic using bpf and implements the > mundane part for them. In addition, the ability to easily communicate > between qdisc and other components will also bring new opportunities for > new applications and optimizations. This is very cool, and I'm thrilled to see this work getting closer to being merged! :) > * struct_ops changes * > > To make struct_ops works better with bpf qdisc, two new changes are > introduced to bpf specifically for struct_ops programs. Frist, we > introduce "__ref" postfix for arguments in stub functions in patch 1-2. > It allows Qdisc_ops->enqueue to acquire an unique referenced kptr to the > skb argument. Through the reference object tracking mechanism in > the verifier, we can make sure that the acquired skb will be either > enqueued or dropped. Besides, no duplicate references can be acquired. > Then, we allow a referenced kptr to be returned from struct_ops programs > so that we can return an skb naturally. This is done and tested in patch 3 > and 4. > > * Performance of bpf qdisc * > > We tested several bpf qdiscs included in the selftests and their in-tree > counterparts to give you a sense of the performance of qdisc implemented > in bpf. > > The implementation of bpf_fq is fairly complex and slightly different from > fq so later we only compare the two fifo qdiscs. bpf_fq implements the > same fair queueing algorithm in fq, but without flow hash collision > avoidance and garbage collection of inactive flows. bpf_fifo uses a single > bpf_list as a queue instead of three queues for different priorities in > pfifo_fast. The time complexity of fifo however should be similar since the > queue selection time is negligible. > > Test setup: > > client -> qdisc -------------> server > ~~~~~~~~~~~~~~~ ~~~~~~ > nested VM1 @ DC1 VM2 @ DC2 > > Throghput: iperf3 -t 600, 5 times > > Qdisc Average (GBits/sec) > ---------- ------------------- > pfifo_fast 12.52 ± 0.26 > bpf_fifo 11.72 ± 0.32 > fq 10.24 ± 0.13 > bpf_fq 11.92 ± 0.64 > > Latency: sockperf pp --tcp -t 600, 5 times > > Qdisc Average (usec) > ---------- -------------- > pfifo_fast 244.58 ± 7.93 > bpf_fifo 244.92 ± 15.22 > fq 234.30 ± 19.25 > bpf_fq 221.34 ± 10.76 > > Looking at the two fifo qdiscs, the 6.4% drop in throughput in the bpf > implementatioin is consistent with previous observation (v8 throughput > test on a loopback device). This should be able to be mitigated by > supporting adding skb to bpf_list or bpf_rbtree directly in the future. This looks pretty decent! > * Clean up skb in bpf qdisc during reset * > > The current implementation relies on bpf qdisc implementors to correctly > release skbs in queues (bpf graphs or maps) in .reset, which might not be > a safe thing to do. The solution as Martin has suggested would be > supporting private data in struct_ops. This can also help simplifying > implementation of qdisc that works with mq. For examples, qdiscs in the > selftest mostly use global data. Therefore, even if user add multiple > qdisc instances under mq, they would still share the same queue. So is the plan to fix this before merging this series? Seems like a bit of a footgun, otherwise? -Toke