Re: [PATCH net-next 2/3] net: sched: implement TCQ_F_CAN_BYPASS for lockless qdisc

Jakub Kicinski <kuba@xxxxxxxxxx> · Sun, 30 May 2021 13:21:11 -0700

On Sun, 30 May 2021 09:37:09 +0800 Yunsheng Lin wrote:
> On 2021/5/30 2:49, Jakub Kicinski wrote:
> > The fact that MISSED is only cleared under q->seqlock does not matter,
> > because setting it and ->enqueue() are not under any lock. If the thread
> > gets interrupted between:
> > 
> > 	if (q->flags & TCQ_F_CAN_BYPASS && nolock_qdisc_is_empty(q) &&
> > 	    qdisc_run_begin(q)) {
> > 
> > and ->enqueue() we can't guarantee that something else won't come in,
> > take q->seqlock and clear MISSED.
> > 
> > thread1                thread2             thread3
> > # holds seqlock
> >                        qdisc_run_begin(q)
> >                        set(MISSED)
> > pfifo_fast_dequeue
> >   clear(MISSED)
> >   # recheck the queue
> > qdisc_run_end()  
> >                        ->enqueue()  
> >                                             q->flags & TCQ_F_CAN_BYPASS..
> >                                             qdisc_run_begin() # true
> >                                             sch_direct_xmit()
> >                        qdisc_run_begin()
> >                        set(MISSED)
> > 
> > Or am I missing something?
> > 
> > Re-checking nolock_qdisc_is_empty() may or may not help.
> > But it doesn't really matter because there is no ordering
> > requirement between thread2 and thread3 here.  
> 
> I were more focued on explaining that using MISSED is reliable
> as sch_may_need_requeuing() checking in RFCv3 [1] to indicate a
> empty qdisc, and forgot to mention the data race described in
> RFCv3, which is kind of like the one described above:
> 
> "There is a data race as below:
> 
>       CPU1                                   CPU2
> qdisc_run_begin(q)                            .
>         .                                q->enqueue()
> sch_may_need_requeuing()                      .
>     return true                               .
>         .                                     .
>         .                                     .
>     q->enqueue()                              .
> 
> When above happen, the skb enqueued by CPU1 is dequeued after the
> skb enqueued by CPU2 because sch_may_need_requeuing() return true.
> If there is not qdisc bypass, the CPU1 has better chance to queue
> the skb quicker than CPU2.
> 
> This patch does not take care of the above data race, because I
> view this as similar as below:
> 
> Even at the same time CPU1 and CPU2 write the skb to two socket
> which both heading to the same qdisc, there is no guarantee that
> which skb will hit the qdisc first, becuase there is a lot of
> factor like interrupt/softirq/cache miss/scheduling afffecting
> that."
> 
> Does above make sense? Or any idea to avoid it?
> 
> 1. https://patchwork.kernel.org/project/netdevbpf/patch/1616404156-11772-1-git-send-email-linyunsheng@xxxxxxxxxx/

We agree on this one.

Could you draw a sequence diagram of different CPUs (like the one
above) for the case where removing re-checking nolock_qdisc_is_empty()
under q->seqlock leads to incorrect behavior? 

If there is no such case would you be willing to repeat the benchmark
with and without this test?

Sorry for dragging the review out..