Hi Ping-Ke, On Fri, Jan 21, 2022 at 9:10 AM Pkshih <pkshih@xxxxxxxxxxx> wrote: [...] > > > > I do stressed test of connection and suspend, and it get stuck after about > > 4 hours but no useful messages. I will re-build my kernel and turn on lockdep debug > > to see if it can tell me what is wrong. First of all: thank you so much for testing this and investigating the deadlock! > I found some deadlock: > > [ 4891.169653] CPU0 CPU1 > [ 4891.169732] ---- ---- > [ 4891.169799] lock(&rtwdev->mutex); > [ 4891.169874] lock(&local->sta_mtx); > [ 4891.169948] lock(&rtwdev->mutex); > [ 4891.170050] lock(&local->sta_mtx); > > > [ 4919.598630] CPU0 CPU1 > [ 4919.598715] ---- ---- > [ 4919.598779] lock(&local->iflist_mtx); > [ 4919.598900] lock(&rtwdev->mutex); > [ 4919.598995] lock(&local->iflist_mtx); > [ 4919.599092] lock(&rtwdev->mutex); This looks similar to the problem fixed by 5b0efb4d670c8b ("rtw88: avoid circular locking between local->iflist_mtx and rtwdev->mutex") which you have pointed out earlier. It seems to me that we should avoid using the mutex version of ieee80211_iterate_*() because it can lead to more of these issues. So from my point of view the general idea of the code from your attached patch looks good. That said, I'm still very new to mac80211/cfg80211 so I'm also interested in other's opinions. > So, I add wrappers to iterate rtw_iterate_stas() and rtw_iterate_vifs() that > use _atomic version to collect sta and vif, and use list_for_each() to iterate. > Reference code is attached, and I'm still thinking if we can have better method. With "better method" do you mean something like in patch #2 from this series (using unsigned int num_si and struct rtw_sta_info *si[RTW_MAX_MAC_ID_NUM] inside the iter_data) are you thinking of a better way in general? Best regards, Martin