On Sun, 30 May 2021 09:37:09 +0800 Yunsheng Lin wrote: > On 2021/5/30 2:49, Jakub Kicinski wrote: > > The fact that MISSED is only cleared under q->seqlock does not matter, > > because setting it and ->enqueue() are not under any lock. If the thread > > gets interrupted between: > > > > if (q->flags & TCQ_F_CAN_BYPASS && nolock_qdisc_is_empty(q) && > > qdisc_run_begin(q)) { > > > > and ->enqueue() we can't guarantee that something else won't come in, > > take q->seqlock and clear MISSED. > > > > thread1 thread2 thread3 > > # holds seqlock > > qdisc_run_begin(q) > > set(MISSED) > > pfifo_fast_dequeue > > clear(MISSED) > > # recheck the queue > > qdisc_run_end() > > ->enqueue() > > q->flags & TCQ_F_CAN_BYPASS.. > > qdisc_run_begin() # true > > sch_direct_xmit() > > qdisc_run_begin() > > set(MISSED) > > > > Or am I missing something? > > > > Re-checking nolock_qdisc_is_empty() may or may not help. > > But it doesn't really matter because there is no ordering > > requirement between thread2 and thread3 here. > > I were more focued on explaining that using MISSED is reliable > as sch_may_need_requeuing() checking in RFCv3 [1] to indicate a > empty qdisc, and forgot to mention the data race described in > RFCv3, which is kind of like the one described above: > > "There is a data race as below: > > CPU1 CPU2 > qdisc_run_begin(q) . > . q->enqueue() > sch_may_need_requeuing() . > return true . > . . > . . > q->enqueue() . > > When above happen, the skb enqueued by CPU1 is dequeued after the > skb enqueued by CPU2 because sch_may_need_requeuing() return true. > If there is not qdisc bypass, the CPU1 has better chance to queue > the skb quicker than CPU2. > > This patch does not take care of the above data race, because I > view this as similar as below: > > Even at the same time CPU1 and CPU2 write the skb to two socket > which both heading to the same qdisc, there is no guarantee that > which skb will hit the qdisc first, becuase there is a lot of > factor like interrupt/softirq/cache miss/scheduling afffecting > that." > > Does above make sense? Or any idea to avoid it? > > 1. https://patchwork.kernel.org/project/netdevbpf/patch/1616404156-11772-1-git-send-email-linyunsheng@xxxxxxxxxx/ We agree on this one. Could you draw a sequence diagram of different CPUs (like the one above) for the case where removing re-checking nolock_qdisc_is_empty() under q->seqlock leads to incorrect behavior? If there is no such case would you be willing to repeat the benchmark with and without this test? Sorry for dragging the review out..