Thanks for the quick reply Jerome, On Tue, 2024-11-26 at 15:45 +0100, Jérôme Pouiller wrote: > > > + for (i = 0; i < num_queues; i++) { > > > + skb = skb_dequeue(&queues[i]->offchan); > > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > > > Nevertheless, the lockdep splat comes from here, because > > wfx_tx_queues_init() has never been called for wvif->id == 2. > > > > What was your original plan for this to happen? > > Do we need an explicit analogue of wfx_add_interface() for vif->id 2 somewhere? > > > > I still have not come with a reproducer yet, but you definitely have more > > insight into this code, maybe this will ring some bells on your side... > > > > PS. It's v6.11, even though it's exactly the same splan as in > > "staging: wfx: fix potential use before init", but the patch in indeed inside. > > Yes, probably a very similar issue to "staging: wfx: fix potential use > before init". I don't believe the issue is related to wvif->id == 2. > > You have only produced this issue once, that's it? > > I wonder why this does not happen with queues[i]->normal and > queues[i]->cab. Is it because queues[i]->offchan is the first to be > checked? Or mutex_is_locked(&wdev->scan_lock) has an impact in the > process? > > In wfx_add_interface(), the list of wvif is protected by conf_lock. > However, wfx_tx_queues_get_skb() is not protected by conf_lock. We > initialize struct wvif before to add it to the wvif list and we > consider it is sufficient. However, after reading memory-barriers.txt > again, it's probably a wrong assumption. I've actually disassembled the stack trace exactly to offchan processing. I have no idea why kernel sends offchan on a non-configured idle interface, I still need to come up with a reproducer. But as soon as there is an offchan in the sorted list of "queues" (coming originally from VIF 0: void wfx_tx(struct ieee80211_hw *hw, struct ieee80211_tx_control *control, struct sk_buff *skb) { ... if (tx_info->control.vif) wvif = (struct wfx_vif *)tx_info->control.vif->drv_priv; else wvif = wvif_iterate(wdev, NULL); ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Puts any offchan into offchan not of VIF 2, but of the only VIF 0... I think you are right, this could only be offchan queue of VIF 0. But then it's just a race of TX workqueue against wfx_remove_interface()/wfx_add_interface() pair (which I see regularly). We probably need RCU in the TX path and NetLink lock in the VIF add/remove path similar to other network drivers... I can try to come up with a patch for this... -- Alexander Sverdlin Siemens AG www.siemens.com