On Tuesday 26 November 2024 08:27:43 CET Sverdlin, Alexander wrote: > > Hi Jerome, > > I've just got this (with CONFIG_PROVE_LOCKING) in idle, > without any traffic on the wlan interface: > > [119012.104386] INFO: trying to register non-static key. > [119012.109730] The code is fine but needs lockdep annotation, or maybe > [119012.116313] you didn't initialize this object before use? > [119012.121992] turning off the locking correctness validator. > [119012.127778] CPU: 0 UID: 0 PID: 336 Comm: kworker/0:1H Tainted: G O 6.11.0 > [119012.139802] Tainted: [O]=OOT_MODULE > [119012.148547] Workqueue: wfx_bh_wq bh_work [wfx] > [119012.153591] Call trace: > [119012.156282] dump_backtrace+0xa0/0x128 > [119012.160340] show_stack+0x20/0x38 > [119012.163939] dump_stack_lvl+0x8c/0xd0 > [119012.167890] dump_stack+0x18/0x28 > [119012.171472] register_lock_class+0x4b0/0x4c0 > [119012.176033] __lock_acquire+0xa0/0x1f50 > [119012.180148] lock_acquire+0x1f8/0x330 > [119012.184083] _raw_spin_lock_irqsave+0x68/0x98 > [119012.188731] skb_dequeue+0x2c/0xa8 > [119012.192390] wfx_tx_queues_get+0x20c/0x5a0 [wfx] > [119012.197525] bh_work+0x3bc/0x950 [wfx] > [119012.201749] process_one_work+0x234/0x658 > [119012.206040] worker_thread+0x1bc/0x360 > [119012.210056] kthread+0x124/0x130 > [119012.213535] ret_from_fork+0x10/0x20 > > the uptime is pretty high, as you can see, it's not in startup. > But I've noticed that NetworkManeger closes and opens > the interface from time to time, which leads to > wfx_remove_interface() of wvif->id 0 and consequent > wfx_add_interface() of wvif->id 0. And only 0, which seems to be relevant, > see below... Thank you for the report. Let's dive into this. > On Wed, 2023-10-04 at 19:28 +0200, Jérôme Pouiller wrote: > > Until now, all the traffic was blocked during scan operation. However, > > scan operation is going to be used to implement Remain On Channel (ROC). > > In this case, special frames (marked with IEEE80211_TX_CTL_TX_OFFCHAN) > > must be sent during the operation. > > > > These frames need to be sent on the virtual interface #2. Until now, > > this interface was only used by the device for internal purpose. But > > since API 3.9, it can be used to send data during scan operation (we > > hijack the scan process to implement ROC). > > > > Thus, we need to change a bit the way we match the frames with the > > interface. > > > > Fortunately, the frames received during the scan are marked with the > > correct interface number. So there is no change to do on this part. > > > > Signed-off-by: Jérôme Pouiller <jerome.pouiller@xxxxxxxxxx> > > --- > > drivers/net/wireless/silabs/wfx/data_tx.c | 36 ++++++++++++++++----- > > drivers/net/wireless/silabs/wfx/data_tx.h | 2 ++ > > drivers/net/wireless/silabs/wfx/queue.c | 38 ++++++++++++++++++----- > > drivers/net/wireless/silabs/wfx/queue.h | 1 + > > 4 files changed, 62 insertions(+), 15 deletions(-) > > > > diff --git a/drivers/net/wireless/silabs/wfx/data_tx.c b/drivers/net/wireless/silabs/wfx/data_tx.c > > index ce2b5dcfd8d89..e8b6d41f55196 100644 > > --- a/drivers/net/wireless/silabs/wfx/data_tx.c > > +++ b/drivers/net/wireless/silabs/wfx/data_tx.c > > @@ -226,6 +226,18 @@ struct wfx_hif_req_tx *wfx_skb_txreq(struct sk_buff *skb) > > return req; > > } > > > > +struct wfx_vif *wfx_skb_wvif(struct wfx_dev *wdev, struct sk_buff *skb) > > +{ > > + struct wfx_tx_priv *tx_priv = wfx_skb_tx_priv(skb); > > + struct wfx_hif_msg *hif = (struct wfx_hif_msg *)skb->data; > > + > > + if (tx_priv->vif_id != hif->interface && hif->interface != 2) { > > + dev_err(wdev->dev, "corrupted skb"); > > + return wdev_to_wvif(wdev, hif->interface); > > + } > > + return wdev_to_wvif(wdev, tx_priv->vif_id); > > +} > > + > > static u8 wfx_tx_get_link_id(struct wfx_vif *wvif, struct ieee80211_sta *sta, > > struct ieee80211_hdr *hdr) > > { > > @@ -352,6 +364,7 @@ static int wfx_tx_inner(struct wfx_vif *wvif, struct ieee80211_sta *sta, struct > > /* Fill tx_priv */ > > tx_priv = (struct wfx_tx_priv *)tx_info->rate_driver_data; > > tx_priv->icv_size = wfx_tx_get_icv_len(hw_key); > > + tx_priv->vif_id = wvif->id; > > > > /* Fill hif_msg */ > > WARN(skb_headroom(skb) < wmsg_len, "not enough space in skb"); > > @@ -362,7 +375,10 @@ static int wfx_tx_inner(struct wfx_vif *wvif, struct ieee80211_sta *sta, struct > > hif_msg = (struct wfx_hif_msg *)skb->data; > > hif_msg->len = cpu_to_le16(skb->len); > > hif_msg->id = HIF_REQ_ID_TX; > > - hif_msg->interface = wvif->id; > > + if (tx_info->flags & IEEE80211_TX_CTL_TX_OFFCHAN) > > + hif_msg->interface = 2; > > I never actually see wfx_add_interface() with id 2... > Which leaves all the queues uninitialized.... This is expected. Interface 2 is not a real interface. You can consider it as a special queue (for offchannel packets) on the device. [...] > > @@ -246,6 +250,26 @@ static struct sk_buff *wfx_tx_queues_get_skb(struct wfx_dev *wdev) > > } > > } > > > > + wvif = NULL; > > + while ((wvif = wvif_iterate(wdev, wvif)) != NULL) { > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Is there actually any point in iterating over wvifs here? > It has been done right before and all the queues are now sorted in > the common "queues"? hmm... you're probably right. > > > + for (i = 0; i < num_queues; i++) { > > + skb = skb_dequeue(&queues[i]->offchan); > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > Nevertheless, the lockdep splat comes from here, because > wfx_tx_queues_init() has never been called for wvif->id == 2. > > What was your original plan for this to happen? > Do we need an explicit analogue of wfx_add_interface() for vif->id 2 somewhere? > > I still have not come with a reproducer yet, but you definitely have more > insight into this code, maybe this will ring some bells on your side... > > PS. It's v6.11, even though it's exactly the same splan as in > "staging: wfx: fix potential use before init", but the patch in indeed inside. Yes, probably a very similar issue to "staging: wfx: fix potential use before init". I don't believe the issue is related to wvif->id == 2. You have only produced this issue once, that's it? I wonder why this does not happen with queues[i]->normal and queues[i]->cab. Is it because queues[i]->offchan is the first to be checked? Or mutex_is_locked(&wdev->scan_lock) has an impact in the process? In wfx_add_interface(), the list of wvif is protected by conf_lock. However, wfx_tx_queues_get_skb() is not protected by conf_lock. We initialize struct wvif before to add it to the wvif list and we consider it is sufficient. However, after reading memory-barriers.txt again, it's probably a wrong assumption. So, maybe this could fix the issue: diff --git i/drivers/net/wireless/silabs/wfx/sta.c w/drivers/net/wireless/silabs/wfx/sta.c index a904602f02ce..b22ea4243c0f 100644 --- i/drivers/net/wireless/silabs/wfx/sta.c +++ w/drivers/net/wireless/silabs/wfx/sta.c @@ -748,6 +748,7 @@ int wfx_add_interface(struct ieee80211_hw *hw, struct ieee80211_vif *vif) for (i = 0; i < ARRAY_SIZE(wdev->vif); i++) { if (!wdev->vif[i]) { + smp_mb(); wdev->vif[i] = vif; wvif->id = i; break; However, I am not confident in playing with memory barriers. -- Jérôme Pouiller