On Sat, 29 May 2021 09:44:57 +0800 Yunsheng Lin wrote: > >> @@ -3852,10 +3852,32 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, > >> qdisc_calculate_pkt_len(skb, q); > >> > >> if (q->flags & TCQ_F_NOLOCK) { > >> + if (q->flags & TCQ_F_CAN_BYPASS && nolock_qdisc_is_empty(q) && > >> + qdisc_run_begin(q)) { > >> + /* Retest nolock_qdisc_is_empty() within the protection > >> + * of q->seqlock to ensure qdisc is indeed empty. > >> + */ > >> + if (unlikely(!nolock_qdisc_is_empty(q))) { > > > > This is just for the DRAINING case right? > > > > MISSED can be set at any moment, testing MISSED seems confusing. > > MISSED is only set when there is lock contention, which means it > is better not to do the qdisc bypass to avoid out of order packet > problem, Avoid as in make less likely? Nothing guarantees other thread is not interrupted after ->enqueue and before qdisc_run_begin(). TBH I'm not sure what out-of-order situation you're referring to, there is no ordering guarantee between separate threads trying to transmit AFAIU. IOW this check is not required for correctness, right? > another good thing is that we could also do the batch > dequeuing and transmiting of packets when there is lock contention. No doubt, but did you see the flag get set significantly often here to warrant the double-checking? > > Is it really worth the extra code? > > Currently DRAINING is only set for the netdev queue stopped. > We could only use DRAINING to indicate the non-empty of a qdisc, > then we need to set the DRAINING evrywhere MISSED is set, that is > why I use both DRAINING and MISSED to indicate a non-empty qdisc. > > > > >> + rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK; > >> + __qdisc_run(q); > >> + qdisc_run_end(q); > >> + > >> + goto no_lock_out; > >> + } > >> + > >> + qdisc_bstats_cpu_update(q, skb); > >> + if (sch_direct_xmit(skb, q, dev, txq, NULL, true) && > >> + !nolock_qdisc_is_empty(q)) > >> + __qdisc_run(q); > >> + > >> + qdisc_run_end(q); > >> + return NET_XMIT_SUCCESS; > >> + } > >> + > >> rc = q->enqueue(skb, q, &to_free) & NET_XMIT_MASK; > >> - if (likely(!netif_xmit_frozen_or_stopped(txq))) > >> - qdisc_run(q); > >> + qdisc_run(q); > >> > >> +no_lock_out: > >> if (unlikely(to_free)) > >> kfree_skb_list(to_free); > >> return rc; > >> @@ -164,9 +166,13 @@ static inline void dev_requeue_skb(struct sk_buff *skb, struct Qdisc *q) > >> > >> skb = next; > >> } > >> - if (lock) > >> + > >> + if (lock) { > >> spin_unlock(lock); > >> - __netif_schedule(q); > >> + set_bit(__QDISC_STATE_MISSED, &q->state); > >> + } else { > >> + __netif_schedule(q); > > > > Could we reorder qdisc_run_begin() with clear_bit(SCHED) > > in net_tx_action() and add SCHED to the NON_EMPTY mask? > > Did you mean clearing the SCHED after the q->seqlock is > taken? > > The problem is that the SCHED is also used to indicate > a qdisc is in sd->output_queue or not, and the > qdisc_run_begin() called by net_tx_action() can not > guarantee it will take the q->seqlock(we are using trylock > for lockless qdisc) Ah, right. We'd need to do some more flag juggling in net_tx_action() to get it right. > >> + } > >> } > >> > >> static void try_bulk_dequeue_skb(struct Qdisc *q, > >> @@ -409,7 +415,11 @@ void __qdisc_run(struct Qdisc *q) > >> while (qdisc_restart(q, &packets)) { > >> quota -= packets; > >> if (quota <= 0) { > >> - __netif_schedule(q); > >> + if (q->flags & TCQ_F_NOLOCK) > >> + set_bit(__QDISC_STATE_MISSED, &q->state); > >> + else > >> + __netif_schedule(q); > >> + > >> break; > >> } > >> } > >> @@ -680,13 +690,14 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc) > >> if (likely(skb)) { > >> qdisc_update_stats_at_dequeue(qdisc, skb); > >> } else if (need_retry && > >> - test_bit(__QDISC_STATE_MISSED, &qdisc->state)) { > >> + READ_ONCE(qdisc->state) & QDISC_STATE_NON_EMPTY) { > > > > Do we really need to retry based on DRAINING being set? > > Or is it just a convenient way of coding things up? > > Yes, it is just a convenient way of coding things up. > Only MISSED need retrying.