Hi, On Sun, Dec 20, 2020 at 5:53 PM Miaoqing Pan <miaoqing@xxxxxxxxxxxxxx> wrote: > > Failed to transmit wmi management frames: > > [84977.840894] ath10k_snoc a000000.wifi: wmi mgmt tx queue is full > [84977.840913] ath10k_snoc a000000.wifi: failed to transmit packet, dropping: -28 > [84977.840924] ath10k_snoc a000000.wifi: failed to submit frame: -28 > [84977.840932] ath10k_snoc a000000.wifi: failed to transmit frame: -28 > > This issue is caused by race condition between skb_dequeue and > __skb_queue_tail. The queue of ‘wmi_mgmt_tx_queue’ is protected by a > different lock: ar->data_lock vs list->lock, the result is no protection. Nice catch! > --- a/drivers/net/wireless/ath/ath10k/mac.c > +++ b/drivers/net/wireless/ath/ath10k/mac.c > @@ -3763,23 +3763,16 @@ bool ath10k_mac_tx_frm_has_freq(struct ath10k *ar) > static int ath10k_mac_tx_wmi_mgmt(struct ath10k *ar, struct sk_buff *skb) > { > struct sk_buff_head *q = &ar->wmi_mgmt_tx_queue; > - int ret = 0; > - > - spin_lock_bh(&ar->data_lock); > > if (skb_queue_len(q) == ATH10K_MAX_NUM_MGMT_PENDING) { I believe you should be switching this to use skb_queue_len_lockless() too. And this still probably leaves a TOCTOU race; maybe we should use ">=" here, in case we queue a few SKBs simultaneously? It doesn't seem like we actually have a hard limit here, but it still seems like we shouldn't leave this potential inconsistency. Brian > ath10k_warn(ar, "wmi mgmt tx queue is full\n"); > - ret = -ENOSPC; > - goto unlock; > + return -ENOSPC; > } > > - __skb_queue_tail(q, skb); > + skb_queue_tail(q, skb); > ieee80211_queue_work(ar->hw, &ar->wmi_mgmt_tx_work);