lockdep WARNING at blktests block/011

Shinichiro Kawasaki <shinichiro.kawasaki@xxxxxxx> · Fri, 30 Sep 2022 00:19:45 +0000

Kernel v6.0-rc7 triggered the WARN "possible circular locking dependency
detected" at blktests test case block/011 for NVME devices [1]. The trigger
commit was c0feea594e05 ("workqueue: don't skip lockdep work dependency in
cancel_work_sync()"). The commit was backported to v5.15.71 stable kernel and
the WARN is observed with the stable kernel also. It looks that the possible
deadlock scenario has been hidden for long time and the commit revealed it now.

As the commit title says, it restored lockdep work dependency in
cancel_work_sync(), which is called from nvme_reset_work() via blk_sync_queue().
The test case repeats PCI disable during I/O on the PCI-NVME device, then
nvme_reset_work() call is repeated.

The WARN was observed with QEMU NVME emulation device. I tried two of my real
NVME devices, and one of them recreated the WARN. The other real NVME device did
not recreate it.

I tried to understand the deadlock scenario, but can not tell if it is for real,
or false-positive. Help to understand the scenario will be appreciated. The
lockdep splat mentions three locks (ctrl->namespaces_rwsem, dev->shutdown_lock
and q->timeout_work). In the related function call paths, it looks that
ctrl->namespaces_rwsem is locked only for read, so the deadlock scenario does
not look possible for me. (Maybe I'm misunderstanding something...)

[1]

[ 3033.791353] run blktests block/011 at 2022-09-27 13:24:34
[ 3064.944407] nvme nvme5: controller is down; will reset: CSTS=0x2, PCI_STATUS=0x10

[ 3066.531825] ======================================================
[ 3066.533372] WARNING: possible circular locking dependency detected
[ 3066.534914] 6.0.0-rc7 #1 Not tainted
[ 3066.535891] ------------------------------------------------------
[ 3066.537492] kworker/u8:0/904 is trying to acquire lock:
[ 3066.538807] ffff888102f24788 ((work_completion)(&q->timeout_work)){+.+.}-{0:0}, at: __flush_work+0xc2/0x900
[ 3066.541161] 
               but task is already holding lock:
[ 3066.542793] ffff888116840528 (&ctrl->namespaces_rwsem){++++}-{3:3}, at: nvme_sync_queues+0x26/0x100 [nvme_core]
[ 3066.545294] 
               which lock already depends on the new lock.

[ 3066.547563] 
               the existing dependency chain (in reverse order) is:
[ 3066.549530] 
               -> #2 (&ctrl->namespaces_rwsem){++++}-{3:3}:
[ 3066.551431]        down_read+0x3e/0x50
[ 3066.552421]        nvme_start_freeze+0x24/0xb0 [nvme_core]
[ 3066.553862]        nvme_dev_disable+0x4ee/0x660 [nvme]
[ 3066.555208]        nvme_timeout.cold+0x3f2/0x699 [nvme]
[ 3066.556575]        blk_mq_check_expired+0x19a/0x270
[ 3066.557865]        bt_iter+0x21e/0x300
[ 3066.558935]        blk_mq_queue_tag_busy_iter+0x7fd/0x1240
[ 3066.560356]        blk_mq_timeout_work+0x143/0x380
[ 3066.562602]        process_one_work+0x816/0x1300
[ 3066.563847]        worker_thread+0xfc/0x1270
[ 3066.564942]        kthread+0x29b/0x340
[ 3066.565973]        ret_from_fork+0x1f/0x30
[ 3066.567083] 
               -> #1 (&dev->shutdown_lock){+.+.}-{3:3}:
[ 3066.568851]        __mutex_lock+0x14c/0x12b0
[ 3066.570010]        nvme_dev_disable+0x5d/0x660 [nvme]
[ 3066.571309]        nvme_timeout.cold+0x3f2/0x699 [nvme]
[ 3066.572678]        blk_mq_check_expired+0x19a/0x270
[ 3066.573972]        bt_iter+0x21e/0x300
[ 3066.575024]        blk_mq_queue_tag_busy_iter+0x7fd/0x1240
[ 3066.576431]        blk_mq_timeout_work+0x143/0x380
[ 3066.577688]        process_one_work+0x816/0x1300
[ 3066.578882]        worker_thread+0xfc/0x1270
[ 3066.580046]        kthread+0x29b/0x340
[ 3066.581081]        ret_from_fork+0x1f/0x30
[ 3066.582194] 
               -> #0 ((work_completion)(&q->timeout_work)){+.+.}-{0:0}:
[ 3066.584265]        __lock_acquire+0x2843/0x5520
[ 3066.585426]        lock_acquire+0x194/0x4f0
[ 3066.586458]        __flush_work+0xe2/0x900
[ 3066.587484]        __cancel_work_timer+0x27a/0x3a0
[ 3066.588568]        nvme_sync_queues+0x71/0x100 [nvme_core]
[ 3066.589782]        nvme_reset_work+0x137/0x39f0 [nvme]
[ 3066.590896]        process_one_work+0x816/0x1300
[ 3066.591895]        worker_thread+0xfc/0x1270
[ 3066.592795]        kthread+0x29b/0x340
[ 3066.593654]        ret_from_fork+0x1f/0x30
[ 3066.594569] 
               other info that might help us debug this:

[ 3066.596290] Chain exists of:
                 (work_completion)(&q->timeout_work) --> &dev->shutdown_lock --> &ctrl->namespaces_rwsem

[ 3066.598960]  Possible unsafe locking scenario:

[ 3066.600167]        CPU0                    CPU1
[ 3066.600985]        ----                    ----
[ 3066.601779]   lock(&ctrl->namespaces_rwsem);
[ 3066.602506]                                lock(&dev->shutdown_lock);
[ 3066.603597]                                lock(&ctrl->namespaces_rwsem);
[ 3066.604739]   lock((work_completion)(&q->timeout_work));
[ 3066.605653] 
                *** DEADLOCK ***

[ 3066.606728] 3 locks held by kworker/u8:0/904:
[ 3066.607436]  #0: ffff888111fe3138 ((wq_completion)nvme-reset-wq){+.+.}-{0:0}, at: process_one_work+0x73c/0x1300
[ 3066.608959]  #1: ffff88811ddcfdd0 ((work_completion)(&dev->ctrl.reset_work)){+.+.}-{0:0}, at: process_one_work+0x769/0x1300
[ 3066.610491]  #2: ffff888116840528 (&ctrl->namespaces_rwsem){++++}-{3:3}, at: nvme_sync_queues+0x26/0x100 [nvme_core]
[ 3066.612010] 
               stack backtrace:
[ 3066.612819] CPU: 1 PID: 904 Comm: kworker/u8:0 Not tainted 6.0.0-rc7 #1
[ 3066.613951] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
[ 3066.615613] Workqueue: nvme-reset-wq nvme_reset_work [nvme]
[ 3066.616532] Call Trace:
[ 3066.616985]  <TASK>
[ 3066.617378]  dump_stack_lvl+0x5b/0x77
[ 3066.618057]  check_noncircular+0x242/0x2f0
[ 3066.618734]  ? lock_chain_count+0x20/0x20
[ 3066.727613]  ? print_circular_bug+0x1e0/0x1e0
[ 3066.950865]  ? unwind_next_frame+0x21b/0x1e60
[ 3066.951388]  ? ret_from_fork+0x1f/0x30
[ 3066.951855]  ? lockdep_lock+0xbc/0x1a0
[ 3066.952361]  ? call_rcu_zapped+0xc0/0xc0
[ 3066.952835]  __lock_acquire+0x2843/0x5520
[ 3066.953315]  ? stack_trace_save+0x81/0xa0
[ 3066.953845]  ? lockdep_hardirqs_on_prepare+0x410/0x410
[ 3066.954439]  lock_acquire+0x194/0x4f0
[ 3066.954931]  ? __flush_work+0xc2/0x900
[ 3066.955389]  ? lock_downgrade+0x6b0/0x6b0
[ 3066.955865]  ? lockdep_unlock+0xf2/0x250
[ 3066.956380]  ? __lock_acquire+0x2080/0x5520
[ 3066.956878]  __flush_work+0xe2/0x900
[ 3066.957316]  ? __flush_work+0xc2/0x900
[ 3066.957813]  ? mark_lock+0xee/0x1610
[ 3066.958249]  ? queue_delayed_work_on+0x90/0x90
[ 3066.958774]  ? lock_chain_count+0x20/0x20
[ 3066.959312]  ? lock_acquire+0x194/0x4f0
[ 3066.959776]  ? lock_acquire+0x1a4/0x4f0
[ 3066.960233]  ? lock_is_held_type+0xe3/0x140
[ 3066.960777]  ? find_held_lock+0x2d/0x110
[ 3066.961247]  ? mark_held_locks+0x9e/0xe0
[ 3066.961707]  ? lockdep_hardirqs_on_prepare+0x17b/0x410
[ 3066.962330]  __cancel_work_timer+0x27a/0x3a0
[ 3066.962832]  ? cancel_delayed_work+0x10/0x10
[ 3066.963381]  ? _raw_spin_unlock_irqrestore+0x40/0x60
[ 3066.963952]  ? try_to_del_timer_sync+0xd0/0xd0
[ 3066.964467]  nvme_sync_queues+0x71/0x100 [nvme_core]
[ 3066.965087]  nvme_reset_work+0x137/0x39f0 [nvme]
[ 3066.965627]  ? __lock_acquire+0x1593/0x5520
[ 3066.966173]  ? nvme_queue_rqs+0xa80/0xa80 [nvme]
[ 3066.966713]  ? lock_acquire+0x194/0x4f0
[ 3066.967180]  ? lock_acquire+0x1a4/0x4f0
[ 3066.967698]  ? lock_downgrade+0x6b0/0x6b0
[ 3066.968176]  ? reacquire_held_locks+0x4e0/0x4e0
[ 3066.968751]  ? lock_is_held_type+0xe3/0x140
[ 3066.969247]  process_one_work+0x816/0x1300
[ 3066.969746]  ? lock_downgrade+0x6b0/0x6b0
[ 3066.970279]  ? pwq_dec_nr_in_flight+0x230/0x230
[ 3066.970818]  ? rwlock_bug.part.0+0x90/0x90
[ 3066.971303]  worker_thread+0xfc/0x1270
[ 3066.971818]  ? process_one_work+0x1300/0x1300
[ 3066.972330]  kthread+0x29b/0x340
[ 3066.972746]  ? kthread_complete_and_exit+0x20/0x20
[ 3066.973345]  ret_from_fork+0x1f/0x30
[ 3066.973791]  </TASK>
[ 3067.963008] nvme nvme5: 4/0/0 default/read/poll queues
[ 3067.972454] nvme nvme5: Ignoring bogus Namespace Identifiers
[ 3098.224067] nvme nvme5: I/O 936 (I/O Cmd) QID 2 timeout, aborting
[ 3098.225903] nvme nvme5: I/O 598 (I/O Cmd) QID 4 timeout, aborting
[ 3098.226229] nvme nvme5: Abort status: 0x0
[ 3098.227499] nvme nvme5: I/O 599 (I/O Cmd) QID 4 timeout, aborting
[ 3098.227872] nvme nvme5: Abort status: 0x0
[ 3098.231401] nvme nvme5: Abort status: 0x0
[ 3128.432035] nvme nvme5: I/O 936 QID 2 timeout, reset controller
[ 3130.808541] nvme nvme5: 4/0/0 default/read/poll queues
[ 3130.816416] nvme nvme5: Ignoring bogus Namespace Identifiers
[ 3161.199887] nvme nvme5: I/O 936 (I/O Cmd) QID 2 timeout, aborting
[ 3161.201909] nvme nvme5: Abort status: 0x0
[ 3191.407811] nvme nvme5: I/O 936 QID 2 timeout, reset controller
[ 3254.895644] nvme nvme5: controller is down; will reset: CSTS=0x2, PCI_STATUS=0x10
[ 3257.299155] nvme nvme5: 4/0/0 default/read/poll queues
[ 3257.312480] nvme nvme5: Ignoring bogus Namespace Identifiers

-- 
Shin'ichiro Kawasaki