On Tue, 2023-12-12 at 06:15 +0700, Philip Müller wrote: > On 12.12.23 05:57, Léo Lam wrote: > > On Mon, 2023-12-11 at 07:47 +0100, Greg KH wrote: > > > On Sun, Dec 10, 2023 at 09:39:30PM +0000, Léo Lam wrote: > > > > Commit 4a7e92551618f3737b305f62451353ee05662f57 ("wifi: cfg80211: fix > > > > CQM for non-range use" on 6.6.x) causes nl80211_set_cqm_rssi not to > > > > release the wdev lock in some situations. > > > > > > > > Of course, the ensuing deadlock causes userland network managers to > > > > break pretty badly, and on typical systems this also causes lockups on > > > > on suspend, poweroff and reboot. See [1], [2], [3] for example reports. > > > > > > > > The upstream commit, 7e7efdda6adb385fbdfd6f819d76bc68c923c394 > > > > ("wifi: cfg80211: fix CQM for non-range use"), does not trigger this > > > > issue because the wdev lock does not exist there. > > > > > > > > Fix the deadlock by releasing the lock before returning. > > > > > > > > [1] https://bugzilla.kernel.org/show_bug.cgi?id=218247 > > > > [2] https://bbs.archlinux.org/viewtopic.php?id=290976 > > > > [3] https://lore.kernel.org/all/87sf4belmm.fsf@xxxxxxxxxxxxx/ > > > > > > > > Fixes: 4a7e92551618 ("wifi: cfg80211: fix CQM for non-range use") > > > > Cc: stable@xxxxxxxxxxxxxxx > > > > Signed-off-by: Léo Lam <leo@xxxxxxxxx> > > > > --- > > > > net/wireless/nl80211.c | 18 ++++++++++++------ > > > > 1 file changed, 12 insertions(+), 6 deletions(-) > > > > > > > Apologies for the slow reply - been dealing with some eye soreness. :( > > > > First of all, thank you for taking the time to review this and for > > reverting the broken commit so quickly as it seems quite a few users > > were hitting this. > > > > > So this is only for the 6.6.y tree? If so, you should at least cc: the > > > other wireless developers involved in the original fix, right? > > > > > You're right. Sorry I forgot to cc: johannes.berg@xxxxxxxxx; though just > > to clarify, there is nothing wrong with their commit per se; the issue > > comes from how it was backported without 076fc8775daf ("wifi: cfg80211: > > remove wdev mutex"). > > > > > And what commit actually fixed this issue upstream, why not take that > > > instead? > > > > > > > As far as I understand, this was never an issue upstream because > > 076fc8775daf ("wifi: cfg80211: remove wdev mutex") was committed in > > August, *before* commit 7e7efdda6adb ("wifi: cfg80211: fix CQM for non- > > range use") added the early returns in late November. This only became > > an issue on the 6.1.x and 6.6.x trees because the CQM fix commit thxwas > > applied without first applying the "remove wdev mutex" as well. > > > > I did consider taking 076fc8775daf (i.e. removing the wdev mutex) and > > applying it to the 6.6.x tree but that diff is much bigger than 100 > > lines long and I thought it would be simpler and safer to just fix the > > buggy error handling. Especially for a newcomer who isn't very familiar > > with the development process... > > > > > > Hi Leo, > > thx for the patch. At least some users on my end can say it fixed the > issue for them. Also Johannes checked your patch by now: > https://lore.kernel.org/stable/DM4PR11MB5359FE14974D50E0D48C2D02E98FA@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > > So your patch can be applied via a patch series by including Johannes > Berg's patch as well. Addressing all error paths works too in the end ;) > Sorry if this is a newbie question, but just to confirm: do we really want a patch series with: 1. Johannes Berg's original patch (7e7efdda6adb); 2. my error handling patch ...instead of an adjusted version of 7e7efdda6adb with the error handling fixed for the older trees? I thought each patch in a series was supposed to produce a working kernel (to make bisects easier among other things). -- Thanks, Leo