Hi Alex, aahringo@xxxxxxxxxx wrote on Fri, 3 Jun 2022 21:50:15 -0400: > Hi, > > On Fri, Jun 3, 2022 at 1:55 PM Miquel Raynal <miquel.raynal@xxxxxxxxxxx> wrote: > > > > Hi Stefan, Alex, > > > > stefan@xxxxxxxxxxxxxxxxxx wrote on Wed, 1 Jun 2022 23:01:51 +0200: > > > > > Hello. > > > > > > On 01.06.22 05:30, Alexander Aring wrote: > > > > Hi, > > > > > > > > On Thu, May 19, 2022 at 11:06 AM Miquel Raynal > > > > <miquel.raynal@xxxxxxxxxxx> wrote: > > > >> > > > >> Hello, > > > >> > > > >> This series brings support for that famous synchronous Tx API for MLME > > > >> commands. > > > >> > > > >> MLME commands will be used during scan operations. In this situation, > > > >> we need to be sure that all transfers finished and that no transfer > > > >> will be queued for a short moment. > > > >> > > > > > > > > Acked-by: Alexander Aring <aahringo@xxxxxxxxxx> > > > > > > These patches have been applied to the wpan-next tree. Thanks! > > > > > > > There will be now functions upstream which will never be used, Stefan > > > > should wait until they are getting used before sending it to net-next. > > > > > > Indeed this can wait until we have a consumer of the functions before pushing this forward to net-next. Pretty sure Miquel is happy to finally move on to other pieces of his puzzle and use them. :-) > > > > Next part is coming! > > > > In the mean time I've experienced a new lockdep warning: > > > > All the netlink commands are executed with the rtnl taken. > > In my current implementation, when I configure/edit a scan request or a > > beacon request I take a scan_lock or a beacons_lock, so they may only > > be taken after the rtnl in this case, which leads to this sequence of > > events: > > - the rtnl is taken (by the net core) > > - the beacon's lock is taken > > > > But now in a beacon's work or an active scan work, what happens is: > > - work gets woken up > > - the beacon/scan lock is taken > > - a beacon/beacon-request frame is transmitted > > - the rtnl lock is taken during this transmission > > > > Lockdep then detects a possible circular dependency: > > [ 490.153387] CPU0 CPU1 > > [ 490.153391] ---- ---- > > [ 490.153394] lock(&local->beacons_lock); > > [ 490.153400] lock(rtnl_mutex); > > [ 490.153406] lock(&local->beacons_lock); > > [ 490.153412] lock(rtnl_mutex); > > > > So in practice, I always need to have the rtnl lock taken when > > acquiring these other locks (beacon/scan_lock) which I think is far > > from optimal. > > > > *Note that those can also be false positives. > > > 1# One solution is to drop the beacons/scan locks because they are not > > useful anymore and simply rely on the rtnl. > > > > depends on how long it will be held. > > > 2# Another solution would be to change the mlme_tx() implementation to > > finally not need the rtnl at all. > > > > Note that just calling ASSERT_RTNL() makes no difference in 2#, it > > still means that I always need to acquire the rtnl before acquiring the > > beacons/scan locks, which greatly reduces their usefulness and leads to > > solution 1# in the end. > > > > IIRC I decided to introduce the rtnl to avoid ->ndo_stop() calls during > > an MLME transmission. I don't know if it has another use there. If not, > > we may perhaps get rid of the rtnl in mlme_tx() by really handling the > > stop calls (but I was too lazy so far to do that). > > > > What direction would you advise? > > Hard to say without code. Please show us some code of the current > state... there should also be some stacktrace of the circular lock > dependency, please provide the full output _matching_ the provided > code. Of course, here is the branch that I used to produce the warning: https://github.com/miquelraynal/linux/ branch wpan-next/scan Triggering this is just a matter or executing nl802154_send_beacons(). And here is the trace which appears in the dmesg: [ 234.224911] mac802154_hwsim mac802154_hwsim: Added 2 mac802154 hwsim hardware radios [ 257.846221] Sending beacon [ 257.847439] ====================================================== [ 257.847446] WARNING: possible circular locking dependency detected [ 257.847463] 5.18.0-rc4-uwb+ #217 Not tainted [ 257.847473] ------------------------------------------------------ [ 257.847479] kworker/u4:4/53 is trying to acquire lock: [ 257.847488] ffffffff9d049d48 (rtnl_mutex){+.+.}-{3:3}, at: ieee802154_mlme_tx+0xf/0x160 [mac802154] [ 257.847577] but task is already holding lock: [ 257.847584] ffff89b082ea7ae0 (&local->beacons_lock){+.+.}-{3:3}, at: mac802154_beacons_work+0x1d/0xb0 [mac802154] [ 257.847651] which lock already depends on the new lock. [ 257.847668] the existing dependency chain (in reverse order) is: [ 257.847674] -> #1 (&local->beacons_lock){+.+.}-{3:3}: [ 257.847702] __mutex_lock+0x9d/0x9a0 [ 257.847719] mac802154_send_beacons+0x32/0x80 [mac802154] [ 257.847767] nl802154_send_beacons+0xd7/0x1f0 [ieee802154] [ 257.847829] genl_family_rcv_msg_doit+0xe5/0x140 [ 257.847842] genl_rcv_msg+0xd7/0x1e0 [ 257.847852] netlink_rcv_skb+0x4c/0xf0 [ 257.847862] genl_rcv+0x1f/0x30 [ 257.847871] netlink_unicast+0x191/0x260 [ 257.847882] netlink_sendmsg+0x22e/0x480 [ 257.847892] sock_sendmsg+0x59/0x60 [ 257.847907] ____sys_sendmsg+0x20c/0x260 [ 257.847922] ___sys_sendmsg+0x7c/0xc0 [ 257.847932] __sys_sendmsg+0x54/0xa0 [ 257.847942] do_syscall_64+0x3b/0x90 [ 257.847956] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 257.847972] -> #0 (rtnl_mutex){+.+.}-{3:3}: [ 257.847989] __lock_acquire+0x1253/0x22e0 [ 257.848002] lock_acquire+0xca/0x2f0 [ 257.848011] __mutex_lock+0x9d/0x9a0 [ 257.848023] ieee802154_mlme_tx+0xf/0x160 [mac802154] [ 257.848058] ieee802154_mlme_tx_one+0x2d/0x40 [mac802154] [ 257.848092] mac802154_beacons_work.cold+0x100/0x110 [mac802154] [ 257.848135] process_one_work+0x26f/0x5a0 [ 257.848147] worker_thread+0x4a/0x3d0 [ 257.848158] kthread+0xee/0x120 [ 257.848168] ret_from_fork+0x22/0x30 [ 257.848180] other info that might help us debug this: [ 257.848187] Possible unsafe locking scenario: [ 257.848192] CPU0 CPU1 [ 257.848198] ---- ---- [ 257.848203] lock(&local->beacons_lock); [ 257.848215] lock(rtnl_mutex); [ 257.848226] lock(&local->beacons_lock); [ 257.848236] lock(rtnl_mutex); [ 257.848246] *** DEADLOCK *** [ 257.848252] 3 locks held by kworker/u4:4/53: [ 257.848262] #0: ffff89b0b66b4138 ((wq_completion)phy0){+.+.}-{0:0}, at: process_one_work+0x1ef/0x5a0 [ 257.848290] #1: ffffa021404e3e78 ((work_completion)(&(&local->beacons_work)->work)){+.+.}-{0:0}, at: process_one_work+0x1ef/0x5a0 [ 257.848317] #2: ffff89b082ea7ae0 (&local->beacons_lock){+.+.}-{3:3}, at: mac802154_beacons_work+0x1d/0xb0 [mac802154] [ 257.848371] stack backtrace: [ 257.848388] CPU: 1 PID: 53 Comm: kworker/u4:4 Not tainted 5.18.0-rc4-uwb+ #217 [ 257.848404] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 [ 257.848422] Workqueue: phy0 mac802154_beacons_work [mac802154] [ 257.848472] Call Trace: [ 257.848490] <TASK> [ 257.848507] dump_stack_lvl+0x45/0x59 [ 257.848536] check_noncircular+0xfe/0x110 [ 257.848559] __lock_acquire+0x1253/0x22e0 [ 257.848581] lock_acquire+0xca/0x2f0 [ 257.848592] ? ieee802154_mlme_tx+0xf/0x160 [mac802154] [ 257.848638] __mutex_lock+0x9d/0x9a0 [ 257.848652] ? ieee802154_mlme_tx+0xf/0x160 [mac802154] [ 257.848690] ? mark_held_locks+0x49/0x70 [ 257.848701] ? ieee802154_mlme_tx+0xf/0x160 [mac802154] [ 257.848737] ? _raw_spin_unlock_irqrestore+0x28/0x50 [ 257.848753] ? lockdep_hardirqs_on+0x79/0x100 [ 257.848770] ? ieee802154_mlme_tx+0xf/0x160 [mac802154] [ 257.848804] ieee802154_mlme_tx+0xf/0x160 [mac802154] [ 257.848841] ieee802154_mlme_tx_one+0x2d/0x40 [mac802154] [ 257.848878] mac802154_beacons_work.cold+0x100/0x110 [mac802154] [ 257.848924] process_one_work+0x26f/0x5a0 [ 257.848944] worker_thread+0x4a/0x3d0 [ 257.848959] ? process_one_work+0x5a0/0x5a0 [ 257.848971] kthread+0xee/0x120 [ 257.848981] ? kthread_complete_and_exit+0x20/0x20 [ 257.848995] ret_from_fork+0x22/0x30 [ 257.849022] </TASK>