Hi, > -----Original Message----- > From: Martin Blumenstingl <martin.blumenstingl@xxxxxxxxxxxxxx> > Sent: Monday, January 24, 2022 3:04 AM > To: Pkshih <pkshih@xxxxxxxxxxx> > Cc: linux-wireless@xxxxxxxxxxxxxxx; tony0620emma@xxxxxxxxx; kvalo@xxxxxxxxxxxxxx; > johannes@xxxxxxxxxxxxxxxx; netdev@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; Neo Jou > <neojou@xxxxxxxxx>; Jernej Skrabec <jernej.skrabec@xxxxxxxxx>; Ed Swierk <eswierk@xxxxx> > Subject: Re: [PATCH v3 0/8] rtw88: prepare locking for SDIO support > > Hi Ping-Ke, > > On Fri, Jan 21, 2022 at 9:10 AM Pkshih <pkshih@xxxxxxxxxxx> wrote: > [...] > > > > > > I do stressed test of connection and suspend, and it get stuck after about > > > 4 hours but no useful messages. I will re-build my kernel and turn on lockdep debug > > > to see if it can tell me what is wrong. > First of all: thank you so much for testing this and investigating the deadlock! > > > I found some deadlock: > > > > [ 4891.169653] CPU0 CPU1 > > [ 4891.169732] ---- ---- > > [ 4891.169799] lock(&rtwdev->mutex); > > [ 4891.169874] lock(&local->sta_mtx); > > [ 4891.169948] lock(&rtwdev->mutex); > > [ 4891.170050] lock(&local->sta_mtx); > > > > > > [ 4919.598630] CPU0 CPU1 > > [ 4919.598715] ---- ---- > > [ 4919.598779] lock(&local->iflist_mtx); > > [ 4919.598900] lock(&rtwdev->mutex); > > [ 4919.598995] lock(&local->iflist_mtx); > > [ 4919.599092] lock(&rtwdev->mutex); > This looks similar to the problem fixed by 5b0efb4d670c8b ("rtw88: > avoid circular locking between local->iflist_mtx and rtwdev->mutex") > which you have pointed out earlier. > It seems to me that we should avoid using the mutex version of > ieee80211_iterate_*() because it can lead to more of these issues. So > from my point of view the general idea of the code from your attached > patch looks good. That said, I'm still very new to mac80211/cfg80211 > so I'm also interested in other's opinions. > The attached patch can work "mostly", because both callers of iterate() and ::remove_interface hold rtwdev->mutex. Theoretically, the exception is a caller forks another work to iterate() between leaving ::remove_interface and mac80211 doesn't yet free the vif, but the work executes after mac80211 free the vif. This will lead use-after-free, but I'm not sure if this scenario will happen. I need time to dig this, or you can help to do this. To avoid this, we can add a flag to struct rtw_vif, and set this flag when ::remove_interface. Then, only collect vif without this flag into list when we use iterate_actiom(). As well as ieee80211_sta can do similar fix. > > So, I add wrappers to iterate rtw_iterate_stas() and rtw_iterate_vifs() that > > use _atomic version to collect sta and vif, and use list_for_each() to iterate. > > Reference code is attached, and I'm still thinking if we can have better method. > With "better method" do you mean something like in patch #2 from this > series (using unsigned int num_si and struct rtw_sta_info > *si[RTW_MAX_MAC_ID_NUM] inside the iter_data) are you thinking of a > better way in general? > I would like a straight method, for example, we can have another version of ieee80211_iterate_xxx() and do things in iterator, like original, so we just need to change the code slightly. Initially, I have an idea we can hold driver lock, like rtwdev->mutex, in both places where we use ieee80211_iterate_() and remove sta or vif. Hopefully, this can ensure it's safe to run iterator without other locks. Then, we can define another ieee80211_iterate_() version with a drv_lock argument, like #define ieee80211_iterate_active_interfaces_drv_lock(hw, iter_flags, iterator, data, drv_lock) \ while (0) { \ lockdep_assert_wiphy(drv_lock); \ ieee80211_iterate_active_interfaces_no_lock(hw, iter_flags, iterator, data); \ } The driv_lock argument can avoid user forgetting to hold a lock, and we need a helper of no_lock version: void ieee80211_iterate_active_interfaces_no_lock( struct ieee80211_hw *hw, u32 iter_flags, void (*iterator)(void *data, u8 *mac, struct ieee80211_vif *vif), void *data) { struct ieee80211_local *local = hw_to_local(hw); __iterate_interfaces(local, iter_flags | IEEE80211_IFACE_ITER_ACTIVE, iterator, data); } However, as I mentioned theoretically it is not safe entirely. So, I think the easiest way is to maintains the vif/sta lists in driver when ::{add,remove }_interface/::sta_{add,remove}, and hold rtwdev->mutex lock to access these lists. But, Johannes pointed out this is not a good idea [1]. [1] https://lore.kernel.org/linux-wireless/d61f3947cddec660cbb2a59e2424d2bd8c01346a.camel@xxxxxxxxxxxxxxxx/ -- Ping-Ke