Hi Johan, > Doing things like hci_conn_hash_flush() while holding the hdev lock is > risky since its synchronous pending work cancellation could cause the > L2CAP layer to try to reacquire the hdev lock. Right now there doesn't > seem to be any obvious places where this would for certain happen but > it's already enough to cause lockdep to start warning against the hdev > and the work struct locks being taken in the "wrong" order: > > [ +0.000373] mgmt-tester/1603 is trying to acquire lock: > [ +0.000292] ((&conn->pending_rx_work)){+.+.+.}, at: [<c104266d>] flush_work+0x0/0x181 > [ +0.000270] > but task is already holding lock: > [ +0.000000] (&hdev->lock){+.+.+.}, at: [<c13b9a80>] hci_dev_do_close+0x166/0x359 > [ +0.000000] > which lock already depends on the new lock. > > [ +0.000000] > the existing dependency chain (in reverse order) is: > [ +0.000000] > -> #1 (&hdev->lock){+.+.+.}: > [ +0.000000] [<c105ea8f>] lock_acquire+0xe3/0x156 > [ +0.000000] [<c140c663>] mutex_lock_nested+0x54/0x375 > [ +0.000000] [<c13d644b>] l2cap_recv_frame+0x293/0x1a9c > [ +0.000000] [<c13d7ca4>] process_pending_rx+0x50/0x5e > [ +0.000000] [<c1041a3f>] process_one_work+0x21c/0x436 > [ +0.000000] [<c1041e3d>] worker_thread+0x1be/0x251 > [ +0.000000] [<c1045a22>] kthread+0x94/0x99 > [ +0.000000] [<c140f801>] ret_from_kernel_thread+0x21/0x30 > [ +0.000000] > -> #0 ((&conn->pending_rx_work)){+.+.+.}: > [ +0.000000] [<c105e158>] __lock_acquire+0xa07/0xc89 > [ +0.000000] [<c105ea8f>] lock_acquire+0xe3/0x156 > [ +0.000000] [<c1042696>] flush_work+0x29/0x181 > [ +0.000000] [<c1042864>] __cancel_work_timer+0x76/0x8f > [ +0.000000] [<c104288c>] cancel_work_sync+0xf/0x11 > [ +0.000000] [<c13d4c18>] l2cap_conn_del+0x72/0x183 > [ +0.000000] [<c13d8953>] l2cap_disconn_cfm+0x49/0x55 > [ +0.000000] [<c13be37a>] hci_conn_hash_flush+0x7a/0xc3 > [ +0.000000] [<c13b9af6>] hci_dev_do_close+0x1dc/0x359 > [ +0.012038] [<c13bbe38>] hci_unregister_dev+0x6e/0x1a3 > [ +0.000000] [<c12d33c1>] vhci_release+0x28/0x47 > [ +0.000000] [<c10dd6a9>] __fput+0xd6/0x154 > [ +0.000000] [<c10dd757>] ____fput+0xd/0xf > [ +0.000000] [<c1044bb2>] task_work_run+0x6b/0x8d > [ +0.000000] [<c1001bd2>] do_notify_resume+0x3c/0x3f > [ +0.000000] [<c140fa70>] work_notifysig+0x29/0x31 > [ +0.000000] > other info that might help us debug this: > > [ +0.000000] Possible unsafe locking scenario: > > [ +0.000000] CPU0 CPU1 > [ +0.000000] ---- ---- > [ +0.000000] lock(&hdev->lock); > [ +0.000000] lock((&conn->pending_rx_work)); > [ +0.000000] lock(&hdev->lock); > [ +0.000000] lock((&conn->pending_rx_work)); > [ +0.000000] > *** DEADLOCK *** > > Fully fixing this would require some quite heavy refactoring to change > how the hdev lock and hci_conn instances are handled together. A simpler > solution for now which this patch takes is to try ensure that the hdev > workqueue is empty before proceeding with the various cleanup calls, > including hci_conn_hash_flush(). > > Signed-off-by: Johan Hedberg <johan.hedberg@xxxxxxxxx> > --- > net/bluetooth/hci_core.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) patch has been applied to bluetooth-next tree. Regards Marcel -- To unsubscribe from this list: send the line "unsubscribe linux-bluetooth" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html