Hi Johannes, On 19.01.2021 10:21, Johannes Berg wrote: > From: Johannes Berg <johannes.berg@xxxxxxxxx> > > Currently, _everything_ in cfg80211 holds the RTNL, and if you > have a slow USB device (or a few) you can get some bad lock > contention on that. > > Fix that by re-adding a mutex to each wiphy/rdev as we had at > some point, so we have locking for the wireless_dev lists and > all the other things in there, and also so that drivers still > don't have to worry too much about it (they still won't get > parallel calls for a single device). > > Then, we can restrict the RTNL to a few cases where we add or > remove interfaces and really need the added protection. Some > of the global list management still also uses the RTNL, since > we need to have it anyway for netdev management, but we only > hold the RTNL for very short periods of time here. > > Signed-off-by: Johannes Berg <johannes.berg@xxxxxxxxx> This patch landed in today's (20210122) linux-next as commit 791daf8fc49a ("cfg80211: avoid holding the RTNL when calling the driver"). Sadly, it causes deadlock with mwifiex driver. I think that lockdep report describes it enough: Bluetooth: vendor=0x2df, device=0x912e, class=255, fn=2 cfg80211: Loading compiled-in X.509 certificates for regulatory database cfg80211: Loaded X.509 cert 'sforshee: 00b28ddf47aef9cea7' Bluetooth: FW download over, size 800344 bytes btmrvl_sdio mmc2:0001:2: sdio device tree data not available mwifiex_sdio mmc2:0001:1: WLAN is not the winner! Skip FW dnld mwifiex_sdio mmc2:0001:1: WLAN FW is active mwifiex_sdio mmc2:0001:1: CMD_RESP: cmd 0x242 error, result=0x2 mwifiex_sdio mmc2:0001:1: mwifiex_process_cmdresp: cmd 0x242 failed during initialization ============================================ WARNING: possible recursive locking detected 5.11.0-rc4-00535-g791daf8fc49a #2336 Not tainted -------------------------------------------- kworker/2:3/108 is trying to acquire lock: c4f62b38 (&rdev->wiphy.mtx){+.+.}-{3:3}, at: _mwifiex_fw_dpc+0x2c0/0x49c [mwifiex] but task is already holding lock: c4f62b38 (&rdev->wiphy.mtx){+.+.}-{3:3}, at: _mwifiex_fw_dpc+0x248/0x49c [mwifiex] other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&rdev->wiphy.mtx); lock(&rdev->wiphy.mtx); *** DEADLOCK *** May be due to missing lock nesting notation 4 locks held by kworker/2:3/108: #0: c1c066a8 ((wq_completion)events){+.+.}-{0:0}, at: process_one_work+0x24c/0x888 #1: deccbf10 ((work_completion)(&fw_work->work)){+.+.}-{0:0}, at: process_one_work+0x24c/0x888 #2: c13202dc (rtnl_mutex){+.+.}-{3:3}, at: _mwifiex_fw_dpc+0x23c/0x49c [mwifiex] #3: c4f62b38 (&rdev->wiphy.mtx){+.+.}-{3:3}, at: _mwifiex_fw_dpc+0x248/0x49c [mwifiex] stack backtrace: CPU: 2 PID: 108 Comm: kworker/2:3 Not tainted 5.11.0-rc4-00535-g791daf8fc49a #2336 Hardware name: Samsung Exynos (Flattened Device Tree) Workqueue: events request_firmware_work_func [<c01116e8>] (unwind_backtrace) from [<c010cf58>] (show_stack+0x10/0x14) [<c010cf58>] (show_stack) from [<c0b3ad3c>] (dump_stack+0xa4/0xc4) [<c0b3ad3c>] (dump_stack) from [<c0195fd8>] (__lock_acquire+0xc20/0x31cc) [<c0195fd8>] (__lock_acquire) from [<c019923c>] (lock_acquire+0x2e4/0x5dc) [<c019923c>] (lock_acquire) from [<c0b4217c>] (__mutex_lock+0xa4/0xb60) [<c0b4217c>] (__mutex_lock) from [<c0b42c54>] (mutex_lock_nested+0x1c/0x24) [<c0b42c54>] (mutex_lock_nested) from [<bf1c87f8>] (_mwifiex_fw_dpc+0x2c0/0x49c [mwifiex]) [<bf1c87f8>] (_mwifiex_fw_dpc [mwifiex]) from [<c06bfd18>] (request_firmware_work_func+0x58/0x94) [<c06bfd18>] (request_firmware_work_func) from [<c0149d48>] (process_one_work+0x30c/0x888) [<c0149d48>] (process_one_work) from [<c014a31c>] (worker_thread+0x58/0x594) [<c014a31c>] (worker_thread) from [<c0151284>] (kthread+0x154/0x19c) [<c0151284>] (kthread) from [<c010011c>] (ret_from_fork+0x14/0x38) Exception stack(0xdeccbfb0 to 0xdeccbff8) ... > ... Best regards -- Marek Szyprowski, PhD Samsung R&D Institute Poland