On Mon, 2023-12-11 at 07:47 +0100, Greg KH wrote: > On Sun, Dec 10, 2023 at 09:39:30PM +0000, Léo Lam wrote: > > Commit 4a7e92551618f3737b305f62451353ee05662f57 ("wifi: cfg80211: fix > > CQM for non-range use" on 6.6.x) causes nl80211_set_cqm_rssi not to > > release the wdev lock in some situations. > > > > Of course, the ensuing deadlock causes userland network managers to > > break pretty badly, and on typical systems this also causes lockups on > > on suspend, poweroff and reboot. See [1], [2], [3] for example reports. > > > > The upstream commit, 7e7efdda6adb385fbdfd6f819d76bc68c923c394 > > ("wifi: cfg80211: fix CQM for non-range use"), does not trigger this > > issue because the wdev lock does not exist there. > > > > Fix the deadlock by releasing the lock before returning. > > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=218247 > > [2] https://bbs.archlinux.org/viewtopic.php?id=290976 > > [3] https://lore.kernel.org/all/87sf4belmm.fsf@xxxxxxxxxxxxx/ > > > > Fixes: 4a7e92551618 ("wifi: cfg80211: fix CQM for non-range use") > > Cc: stable@xxxxxxxxxxxxxxx > > Signed-off-by: Léo Lam <leo@xxxxxxxxx> > > --- > > net/wireless/nl80211.c | 18 ++++++++++++------ > > 1 file changed, 12 insertions(+), 6 deletions(-) > Apologies for the slow reply - been dealing with some eye soreness. :( First of all, thank you for taking the time to review this and for reverting the broken commit so quickly as it seems quite a few users were hitting this. > So this is only for the 6.6.y tree? If so, you should at least cc: the > other wireless developers involved in the original fix, right? > You're right. Sorry I forgot to cc: johannes.berg@xxxxxxxxx; though just to clarify, there is nothing wrong with their commit per se; the issue comes from how it was backported without 076fc8775daf ("wifi: cfg80211: remove wdev mutex"). > And what commit actually fixed this issue upstream, why not take that > instead? > As far as I understand, this was never an issue upstream because 076fc8775daf ("wifi: cfg80211: remove wdev mutex") was committed in August, *before* commit 7e7efdda6adb ("wifi: cfg80211: fix CQM for non- range use") added the early returns in late November. This only became an issue on the 6.1.x and 6.6.x trees because the CQM fix commit was applied without first applying the "remove wdev mutex" as well. I did consider taking 076fc8775daf (i.e. removing the wdev mutex) and applying it to the 6.6.x tree but that diff is much bigger than 100 lines long and I thought it would be simpler and safer to just fix the buggy error handling. Especially for a newcomer who isn't very familiar with the development process... -- Thanks, Leo