On 2021/3/16 16:15, Eric Dumazet wrote: > On Tue, Mar 16, 2021 at 1:35 AM Yunsheng Lin <linyunsheng@xxxxxxxxxx> wrote: >> >> On 2021/3/16 2:53, Jakub Kicinski wrote: >>> On Mon, 15 Mar 2021 11:10:18 +0800 Yunsheng Lin wrote: >>>> @@ -606,6 +623,11 @@ static const u8 prio2band[TC_PRIO_MAX + 1] = { >>>> */ >>>> struct pfifo_fast_priv { >>>> struct skb_array q[PFIFO_FAST_BANDS]; >>>> + >>>> + /* protect against data race between enqueue/dequeue and >>>> + * qdisc->empty setting >>>> + */ >>>> + spinlock_t lock; >>>> }; >>>> >>>> static inline struct skb_array *band2list(struct pfifo_fast_priv *priv, >>>> @@ -623,7 +645,10 @@ static int pfifo_fast_enqueue(struct sk_buff *skb, struct Qdisc *qdisc, >>>> unsigned int pkt_len = qdisc_pkt_len(skb); >>>> int err; >>>> >>>> - err = skb_array_produce(q, skb); >>>> + spin_lock(&priv->lock); >>>> + err = __ptr_ring_produce(&q->ring, skb); >>>> + WRITE_ONCE(qdisc->empty, false); >>>> + spin_unlock(&priv->lock); >>>> >>>> if (unlikely(err)) { >>>> if (qdisc_is_percpu_stats(qdisc)) >>>> @@ -642,6 +667,7 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc) >>>> struct sk_buff *skb = NULL; >>>> int band; >>>> >>>> + spin_lock(&priv->lock); >>>> for (band = 0; band < PFIFO_FAST_BANDS && !skb; band++) { >>>> struct skb_array *q = band2list(priv, band); >>>> >>>> @@ -655,6 +681,7 @@ static struct sk_buff *pfifo_fast_dequeue(struct Qdisc *qdisc) >>>> } else { >>>> WRITE_ONCE(qdisc->empty, true); >>>> } >>>> + spin_unlock(&priv->lock); >>>> >>>> return skb; >>>> } >>> >>> I thought pfifo was supposed to be "lockless" and this change >>> re-introduces a lock between producer and consumer, no? >> >> Yes, the lock breaks the "lockless" of the lockless qdisc for now >> I do not how to solve the below data race locklessly: >> >> CPU1: CPU2: >> dequeue skb . >> . . >> . enqueue skb >> . . >> . WRITE_ONCE(qdisc->empty, false); >> . . >> . . >> WRITE_ONCE(qdisc->empty, true); > > > Maybe it is time to fully document/explain how this can possibly work. I might be able to provide some document/explain on how the lockless qdisc work for I was looking through the code the past few days. By "lockless", I suppose it means there is no lock between enqueuing and dequeuing, whoever grasps the qdisc->seqlock can dequeue the skb and send it out after enqueuing the skb it tries to send, other CPU which does not grasp the qdisc->seqlock just enqueue the skb and return, hoping other cpu holding the qdisc->seqlock can dequeue it's skb and send it out. For the locked qdisck(the one without TCQ_F_NOLOCK flags set), it holds the qdisc_lock() when doing the enqueuing/dequeuing and sch_direct_xmit(), and in sch_direct_xmit() it releases the qdisc_lock() when doing skb validation and calling dev_hard_start_xmit() to send skb to the driver, and hold the qdisc_lock() again after calling dev_hard_start_xmit(), so other cpu may grasps the qdisc_lock() and repeat the above process during that qdisc_lock() release period. So the main difference between lockess qdisc and locked qdisc is if there is a lock to protect both enqueuing and dequeuing. For example, pfifo_fast uses ptr_ring to become lockless for enqueuing or dequeuing, but it still needs producer_lock to protect the concurrent enqueue, and consumer_lock to protect the concurrent dequeue. for Other qdisc that can not provide the lockless between enqueuing or dequeuing, maybe we implement the locking in the specific qdisc implementation, so that it still can claim to be "lockless", like the locking added for pfifo_fast in this patch. Idealy we can claim all qdisc to be lockess qdisc as long as we make sure all qdisc either use lockless implementation, or use internal lock to protect between enqueuing and dequeuing, so that we can remove the locked dqisc and will have less contention between enqueuing and dequeuing. Below patch is the first try to remove the difference between lockess qdisc and locked qdisc, so that we can see if we can remove the locked qdisc in the future: https://patchwork.kernel.org/project/netdevbpf/patch/1615800610-34700-1-git-send-email-linyunsheng@xxxxxxxxxx/ I may be wrong about the above analysis, if there is any error, please point it out. > > lockless qdisc used concurrently by multiple cpus, using > WRITE_ONCE() and READ_ONCE() ? > > Just say no to this. what do you mean "no" here? Do you mean it is impossible to implement lockless qdisc correctly? Or TCQ_F_CAN_BYPASS for lockless qdisc is a wrong direction? And is there any reason? > >> >> If the above happens, the qdisc->empty is true even if the qdisc has some >> skb, which may cuase out of order or packet stuck problem. >> >> It seems we may need to update ptr_ring' status(empty or not) while >> enqueuing/dequeuing atomically in the ptr_ring implementation. >> >> Any better idea? > > . >