On 12.12.23 05:57, Léo Lam wrote:
On Mon, 2023-12-11 at 07:47 +0100, Greg KH wrote:
On Sun, Dec 10, 2023 at 09:39:30PM +0000, Léo Lam wrote:
Commit 4a7e92551618f3737b305f62451353ee05662f57 ("wifi: cfg80211: fix
CQM for non-range use" on 6.6.x) causes nl80211_set_cqm_rssi not to
release the wdev lock in some situations.
Of course, the ensuing deadlock causes userland network managers to
break pretty badly, and on typical systems this also causes lockups on
on suspend, poweroff and reboot. See [1], [2], [3] for example reports.
The upstream commit, 7e7efdda6adb385fbdfd6f819d76bc68c923c394
("wifi: cfg80211: fix CQM for non-range use"), does not trigger this
issue because the wdev lock does not exist there.
Fix the deadlock by releasing the lock before returning.
[1] https://bugzilla.kernel.org/show_bug.cgi?id=218247
[2] https://bbs.archlinux.org/viewtopic.php?id=290976
[3] https://lore.kernel.org/all/87sf4belmm.fsf@xxxxxxxxxxxxx/
Fixes: 4a7e92551618 ("wifi: cfg80211: fix CQM for non-range use")
Cc: stable@xxxxxxxxxxxxxxx
Signed-off-by: Léo Lam <leo@xxxxxxxxx>
---
net/wireless/nl80211.c | 18 ++++++++++++------
1 file changed, 12 insertions(+), 6 deletions(-)
Apologies for the slow reply - been dealing with some eye soreness. :(
First of all, thank you for taking the time to review this and for
reverting the broken commit so quickly as it seems quite a few users
were hitting this.
So this is only for the 6.6.y tree? If so, you should at least cc: the
other wireless developers involved in the original fix, right?
You're right. Sorry I forgot to cc: johannes.berg@xxxxxxxxx; though just
to clarify, there is nothing wrong with their commit per se; the issue
comes from how it was backported without 076fc8775daf ("wifi: cfg80211:
remove wdev mutex").
And what commit actually fixed this issue upstream, why not take that
instead?
As far as I understand, this was never an issue upstream because
076fc8775daf ("wifi: cfg80211: remove wdev mutex") was committed in
August, *before* commit 7e7efdda6adb ("wifi: cfg80211: fix CQM for non-
range use") added the early returns in late November. This only became
an issue on the 6.1.x and 6.6.x trees because the CQM fix commit thxwas
applied without first applying the "remove wdev mutex" as well.
I did consider taking 076fc8775daf (i.e. removing the wdev mutex) and
applying it to the 6.6.x tree but that diff is much bigger than 100
lines long and I thought it would be simpler and safer to just fix the
buggy error handling. Especially for a newcomer who isn't very familiar
with the development process...
Hi Leo,
thx for the patch. At least some users on my end can say it fixed the
issue for them. Also Johannes checked your patch by now:
https://lore.kernel.org/stable/DM4PR11MB5359FE14974D50E0D48C2D02E98FA@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
So your patch can be applied via a patch series by including Johannes
Berg's patch as well. Addressing all error paths works too in the end ;)
--
Best, Philip