Kernel v6.0-rc7 triggered the WARN "possible circular locking dependency detected" at blktests test case block/011 for NVME devices [1]. The trigger commit was c0feea594e05 ("workqueue: don't skip lockdep work dependency in cancel_work_sync()"). The commit was backported to v5.15.71 stable kernel and the WARN is observed with the stable kernel also. It looks that the possible deadlock scenario has been hidden for long time and the commit revealed it now. As the commit title says, it restored lockdep work dependency in cancel_work_sync(), which is called from nvme_reset_work() via blk_sync_queue(). The test case repeats PCI disable during I/O on the PCI-NVME device, then nvme_reset_work() call is repeated. The WARN was observed with QEMU NVME emulation device. I tried two of my real NVME devices, and one of them recreated the WARN. The other real NVME device did not recreate it. I tried to understand the deadlock scenario, but can not tell if it is for real, or false-positive. Help to understand the scenario will be appreciated. The lockdep splat mentions three locks (ctrl->namespaces_rwsem, dev->shutdown_lock and q->timeout_work). In the related function call paths, it looks that ctrl->namespaces_rwsem is locked only for read, so the deadlock scenario does not look possible for me. (Maybe I'm misunderstanding something...) [1] [ 3033.791353] run blktests block/011 at 2022-09-27 13:24:34 [ 3064.944407] nvme nvme5: controller is down; will reset: CSTS=0x2, PCI_STATUS=0x10 [ 3066.531825] ====================================================== [ 3066.533372] WARNING: possible circular locking dependency detected [ 3066.534914] 6.0.0-rc7 #1 Not tainted [ 3066.535891] ------------------------------------------------------ [ 3066.537492] kworker/u8:0/904 is trying to acquire lock: [ 3066.538807] ffff888102f24788 ((work_completion)(&q->timeout_work)){+.+.}-{0:0}, at: __flush_work+0xc2/0x900 [ 3066.541161] but task is already holding lock: [ 3066.542793] ffff888116840528 (&ctrl->namespaces_rwsem){++++}-{3:3}, at: nvme_sync_queues+0x26/0x100 [nvme_core] [ 3066.545294] which lock already depends on the new lock. [ 3066.547563] the existing dependency chain (in reverse order) is: [ 3066.549530] -> #2 (&ctrl->namespaces_rwsem){++++}-{3:3}: [ 3066.551431] down_read+0x3e/0x50 [ 3066.552421] nvme_start_freeze+0x24/0xb0 [nvme_core] [ 3066.553862] nvme_dev_disable+0x4ee/0x660 [nvme] [ 3066.555208] nvme_timeout.cold+0x3f2/0x699 [nvme] [ 3066.556575] blk_mq_check_expired+0x19a/0x270 [ 3066.557865] bt_iter+0x21e/0x300 [ 3066.558935] blk_mq_queue_tag_busy_iter+0x7fd/0x1240 [ 3066.560356] blk_mq_timeout_work+0x143/0x380 [ 3066.562602] process_one_work+0x816/0x1300 [ 3066.563847] worker_thread+0xfc/0x1270 [ 3066.564942] kthread+0x29b/0x340 [ 3066.565973] ret_from_fork+0x1f/0x30 [ 3066.567083] -> #1 (&dev->shutdown_lock){+.+.}-{3:3}: [ 3066.568851] __mutex_lock+0x14c/0x12b0 [ 3066.570010] nvme_dev_disable+0x5d/0x660 [nvme] [ 3066.571309] nvme_timeout.cold+0x3f2/0x699 [nvme] [ 3066.572678] blk_mq_check_expired+0x19a/0x270 [ 3066.573972] bt_iter+0x21e/0x300 [ 3066.575024] blk_mq_queue_tag_busy_iter+0x7fd/0x1240 [ 3066.576431] blk_mq_timeout_work+0x143/0x380 [ 3066.577688] process_one_work+0x816/0x1300 [ 3066.578882] worker_thread+0xfc/0x1270 [ 3066.580046] kthread+0x29b/0x340 [ 3066.581081] ret_from_fork+0x1f/0x30 [ 3066.582194] -> #0 ((work_completion)(&q->timeout_work)){+.+.}-{0:0}: [ 3066.584265] __lock_acquire+0x2843/0x5520 [ 3066.585426] lock_acquire+0x194/0x4f0 [ 3066.586458] __flush_work+0xe2/0x900 [ 3066.587484] __cancel_work_timer+0x27a/0x3a0 [ 3066.588568] nvme_sync_queues+0x71/0x100 [nvme_core] [ 3066.589782] nvme_reset_work+0x137/0x39f0 [nvme] [ 3066.590896] process_one_work+0x816/0x1300 [ 3066.591895] worker_thread+0xfc/0x1270 [ 3066.592795] kthread+0x29b/0x340 [ 3066.593654] ret_from_fork+0x1f/0x30 [ 3066.594569] other info that might help us debug this: [ 3066.596290] Chain exists of: (work_completion)(&q->timeout_work) --> &dev->shutdown_lock --> &ctrl->namespaces_rwsem [ 3066.598960] Possible unsafe locking scenario: [ 3066.600167] CPU0 CPU1 [ 3066.600985] ---- ---- [ 3066.601779] lock(&ctrl->namespaces_rwsem); [ 3066.602506] lock(&dev->shutdown_lock); [ 3066.603597] lock(&ctrl->namespaces_rwsem); [ 3066.604739] lock((work_completion)(&q->timeout_work)); [ 3066.605653] *** DEADLOCK *** [ 3066.606728] 3 locks held by kworker/u8:0/904: [ 3066.607436] #0: ffff888111fe3138 ((wq_completion)nvme-reset-wq){+.+.}-{0:0}, at: process_one_work+0x73c/0x1300 [ 3066.608959] #1: ffff88811ddcfdd0 ((work_completion)(&dev->ctrl.reset_work)){+.+.}-{0:0}, at: process_one_work+0x769/0x1300 [ 3066.610491] #2: ffff888116840528 (&ctrl->namespaces_rwsem){++++}-{3:3}, at: nvme_sync_queues+0x26/0x100 [nvme_core] [ 3066.612010] stack backtrace: [ 3066.612819] CPU: 1 PID: 904 Comm: kworker/u8:0 Not tainted 6.0.0-rc7 #1 [ 3066.613951] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 3066.615613] Workqueue: nvme-reset-wq nvme_reset_work [nvme] [ 3066.616532] Call Trace: [ 3066.616985] <TASK> [ 3066.617378] dump_stack_lvl+0x5b/0x77 [ 3066.618057] check_noncircular+0x242/0x2f0 [ 3066.618734] ? lock_chain_count+0x20/0x20 [ 3066.727613] ? print_circular_bug+0x1e0/0x1e0 [ 3066.950865] ? unwind_next_frame+0x21b/0x1e60 [ 3066.951388] ? ret_from_fork+0x1f/0x30 [ 3066.951855] ? lockdep_lock+0xbc/0x1a0 [ 3066.952361] ? call_rcu_zapped+0xc0/0xc0 [ 3066.952835] __lock_acquire+0x2843/0x5520 [ 3066.953315] ? stack_trace_save+0x81/0xa0 [ 3066.953845] ? lockdep_hardirqs_on_prepare+0x410/0x410 [ 3066.954439] lock_acquire+0x194/0x4f0 [ 3066.954931] ? __flush_work+0xc2/0x900 [ 3066.955389] ? lock_downgrade+0x6b0/0x6b0 [ 3066.955865] ? lockdep_unlock+0xf2/0x250 [ 3066.956380] ? __lock_acquire+0x2080/0x5520 [ 3066.956878] __flush_work+0xe2/0x900 [ 3066.957316] ? __flush_work+0xc2/0x900 [ 3066.957813] ? mark_lock+0xee/0x1610 [ 3066.958249] ? queue_delayed_work_on+0x90/0x90 [ 3066.958774] ? lock_chain_count+0x20/0x20 [ 3066.959312] ? lock_acquire+0x194/0x4f0 [ 3066.959776] ? lock_acquire+0x1a4/0x4f0 [ 3066.960233] ? lock_is_held_type+0xe3/0x140 [ 3066.960777] ? find_held_lock+0x2d/0x110 [ 3066.961247] ? mark_held_locks+0x9e/0xe0 [ 3066.961707] ? lockdep_hardirqs_on_prepare+0x17b/0x410 [ 3066.962330] __cancel_work_timer+0x27a/0x3a0 [ 3066.962832] ? cancel_delayed_work+0x10/0x10 [ 3066.963381] ? _raw_spin_unlock_irqrestore+0x40/0x60 [ 3066.963952] ? try_to_del_timer_sync+0xd0/0xd0 [ 3066.964467] nvme_sync_queues+0x71/0x100 [nvme_core] [ 3066.965087] nvme_reset_work+0x137/0x39f0 [nvme] [ 3066.965627] ? __lock_acquire+0x1593/0x5520 [ 3066.966173] ? nvme_queue_rqs+0xa80/0xa80 [nvme] [ 3066.966713] ? lock_acquire+0x194/0x4f0 [ 3066.967180] ? lock_acquire+0x1a4/0x4f0 [ 3066.967698] ? lock_downgrade+0x6b0/0x6b0 [ 3066.968176] ? reacquire_held_locks+0x4e0/0x4e0 [ 3066.968751] ? lock_is_held_type+0xe3/0x140 [ 3066.969247] process_one_work+0x816/0x1300 [ 3066.969746] ? lock_downgrade+0x6b0/0x6b0 [ 3066.970279] ? pwq_dec_nr_in_flight+0x230/0x230 [ 3066.970818] ? rwlock_bug.part.0+0x90/0x90 [ 3066.971303] worker_thread+0xfc/0x1270 [ 3066.971818] ? process_one_work+0x1300/0x1300 [ 3066.972330] kthread+0x29b/0x340 [ 3066.972746] ? kthread_complete_and_exit+0x20/0x20 [ 3066.973345] ret_from_fork+0x1f/0x30 [ 3066.973791] </TASK> [ 3067.963008] nvme nvme5: 4/0/0 default/read/poll queues [ 3067.972454] nvme nvme5: Ignoring bogus Namespace Identifiers [ 3098.224067] nvme nvme5: I/O 936 (I/O Cmd) QID 2 timeout, aborting [ 3098.225903] nvme nvme5: I/O 598 (I/O Cmd) QID 4 timeout, aborting [ 3098.226229] nvme nvme5: Abort status: 0x0 [ 3098.227499] nvme nvme5: I/O 599 (I/O Cmd) QID 4 timeout, aborting [ 3098.227872] nvme nvme5: Abort status: 0x0 [ 3098.231401] nvme nvme5: Abort status: 0x0 [ 3128.432035] nvme nvme5: I/O 936 QID 2 timeout, reset controller [ 3130.808541] nvme nvme5: 4/0/0 default/read/poll queues [ 3130.816416] nvme nvme5: Ignoring bogus Namespace Identifiers [ 3161.199887] nvme nvme5: I/O 936 (I/O Cmd) QID 2 timeout, aborting [ 3161.201909] nvme nvme5: Abort status: 0x0 [ 3191.407811] nvme nvme5: I/O 936 QID 2 timeout, reset controller [ 3254.895644] nvme nvme5: controller is down; will reset: CSTS=0x2, PCI_STATUS=0x10 [ 3257.299155] nvme nvme5: 4/0/0 default/read/poll queues [ 3257.312480] nvme nvme5: Ignoring bogus Namespace Identifiers -- Shin'ichiro Kawasaki