On Thu, Aug 15, 2024 at 10:47:36AM +0800, Li Lingfeng wrote: > From: Li Lingfeng <lilingfeng3@xxxxxxxxxx> > > Lockdep reported a warning in Linux version 6.6: > > [ 414.344659] ================================ > [ 414.345155] WARNING: inconsistent lock state > [ 414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted > [ 414.346221] -------------------------------- > [ 414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. > [ 414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes: > [ 414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0 > [ 414.351204] {IN-SOFTIRQ-W} state was registered at: > [ 414.351751] lock_acquire+0x18d/0x460 > [ 414.352218] _raw_spin_lock_irqsave+0x39/0x60 > [ 414.352769] __wake_up_common_lock+0x22/0x60 > [ 414.353289] sbitmap_queue_wake_up+0x375/0x4f0 > [ 414.353829] sbitmap_queue_clear+0xdd/0x270 > [ 414.354338] blk_mq_put_tag+0xdf/0x170 > [ 414.354807] __blk_mq_free_request+0x381/0x4d0 > [ 414.355335] blk_mq_free_request+0x28b/0x3e0 > [ 414.355847] __blk_mq_end_request+0x242/0xc30 > [ 414.356367] scsi_end_request+0x2c1/0x830 > [ 414.345155] WARNING: inconsistent lock state > [ 414.345658] 6.6.0-07439-gba2303cacfda #6 Not tainted > [ 414.346221] -------------------------------- > [ 414.346712] inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. > [ 414.347545] kworker/u10:3/1152 [HC0[0]:SC0[0]:HE0:SE1] takes: > [ 414.349245] ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0 > [ 414.351204] {IN-SOFTIRQ-W} state was registered at: > [ 414.351751] lock_acquire+0x18d/0x460 > [ 414.352218] _raw_spin_lock_irqsave+0x39/0x60 > [ 414.352769] __wake_up_common_lock+0x22/0x60 > [ 414.353289] sbitmap_queue_wake_up+0x375/0x4f0 > [ 414.353829] sbitmap_queue_clear+0xdd/0x270 > [ 414.354338] blk_mq_put_tag+0xdf/0x170 > [ 414.354807] __blk_mq_free_request+0x381/0x4d0 > [ 414.355335] blk_mq_free_request+0x28b/0x3e0 > [ 414.355847] __blk_mq_end_request+0x242/0xc30 > [ 414.356367] scsi_end_request+0x2c1/0x830 > [ 414.356863] scsi_io_completion+0x177/0x1610 > [ 414.357379] scsi_complete+0x12f/0x260 > [ 414.357856] blk_complete_reqs+0xba/0xf0 > [ 414.358338] __do_softirq+0x1b0/0x7a2 > [ 414.358796] irq_exit_rcu+0x14b/0x1a0 > [ 414.359262] sysvec_call_function_single+0xaf/0xc0 > [ 414.359828] asm_sysvec_call_function_single+0x1a/0x20 > [ 414.360426] default_idle+0x1e/0x30 > [ 414.360873] default_idle_call+0x9b/0x1f0 > [ 414.361390] do_idle+0x2d2/0x3e0 > [ 414.361819] cpu_startup_entry+0x55/0x60 > [ 414.362314] start_secondary+0x235/0x2b0 > [ 414.362809] secondary_startup_64_no_verify+0x18f/0x19b > [ 414.363413] irq event stamp: 428794 > [ 414.363825] hardirqs last enabled at (428793): [<ffffffff816bfd1c>] ktime_get+0x1dc/0x200 > [ 414.364694] hardirqs last disabled at (428794): [<ffffffff85470177>] _raw_spin_lock_irq+0x47/0x50 > [ 414.365629] softirqs last enabled at (428444): [<ffffffff85474780>] __do_softirq+0x540/0x7a2 > [ 414.366522] softirqs last disabled at (428419): [<ffffffff813f65ab>] irq_exit_rcu+0x14b/0x1a0 > [ 414.367425] > other info that might help us debug this: > [ 414.368194] Possible unsafe locking scenario: > [ 414.368900] CPU0 > [ 414.369225] ---- > [ 414.369548] lock(&sbq->ws[i].wait); > [ 414.370000] <Interrupt> > [ 414.370342] lock(&sbq->ws[i].wait); > [ 414.370802] > *** DEADLOCK *** > [ 414.371569] 5 locks held by kworker/u10:3/1152: > [ 414.372088] #0: ffff88810130e938 ((wq_completion)writeback){+.+.}-{0:0}, at: process_scheduled_works+0x357/0x13f0 > [ 414.373180] #1: ffff88810201fdb8 ((work_completion)(&(&wb->dwork)->work)){+.+.}-{0:0}, at: process_scheduled_works+0x3a3/0x13f0 > [ 414.374384] #2: ffffffff86ffbdc0 (rcu_read_lock){....}-{1:2}, at: blk_mq_run_hw_queue+0x637/0xa00 > [ 414.375342] #3: ffff88810edd1098 (&sbq->ws[i].wait){+.?.}-{2:2}, at: blk_mq_dispatch_rq_list+0x131c/0x1ee0 > [ 414.376377] #4: ffff888106205a08 (&hctx->dispatch_wait_lock){+.-.}-{2:2}, at: blk_mq_dispatch_rq_list+0x1337/0x1ee0 > [ 414.378607] > stack backtrace: > [ 414.379177] CPU: 0 PID: 1152 Comm: kworker/u10:3 Not tainted 6.6.0-07439-gba2303cacfda #6 > [ 414.380032] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 > [ 414.381177] Workqueue: writeback wb_workfn (flush-253:0) > [ 414.381805] Call Trace: > [ 414.382136] <TASK> > [ 414.382429] dump_stack_lvl+0x91/0xf0 > [ 414.382884] mark_lock_irq+0xb3b/0x1260 > [ 414.383367] ? __pfx_mark_lock_irq+0x10/0x10 > [ 414.383889] ? stack_trace_save+0x8e/0xc0 > [ 414.384373] ? __pfx_stack_trace_save+0x10/0x10 > [ 414.384903] ? graph_lock+0xcf/0x410 > [ 414.385350] ? save_trace+0x3d/0xc70 > [ 414.385808] mark_lock.part.20+0x56d/0xa90 > [ 414.386317] mark_held_locks+0xb0/0x110 > [ 414.386791] ? __pfx_do_raw_spin_lock+0x10/0x10 > [ 414.387320] lockdep_hardirqs_on_prepare+0x297/0x3f0 > [ 414.387901] ? _raw_spin_unlock_irq+0x28/0x50 > [ 414.388422] trace_hardirqs_on+0x58/0x100 > [ 414.388917] _raw_spin_unlock_irq+0x28/0x50 > [ 414.389422] __blk_mq_tag_busy+0x1d6/0x2a0 > [ 414.389920] __blk_mq_get_driver_tag+0x761/0x9f0 > [ 414.390899] blk_mq_dispatch_rq_list+0x1780/0x1ee0 > [ 414.391473] ? __pfx_blk_mq_dispatch_rq_list+0x10/0x10 > [ 414.392070] ? sbitmap_get+0x2b8/0x450 > [ 414.392533] ? __blk_mq_get_driver_tag+0x210/0x9f0 > [ 414.393095] __blk_mq_sched_dispatch_requests+0xd99/0x1690 > [ 414.393730] ? elv_attempt_insert_merge+0x1b1/0x420 > [ 414.394302] ? __pfx___blk_mq_sched_dispatch_requests+0x10/0x10 > [ 414.394970] ? lock_acquire+0x18d/0x460 > [ 414.395456] ? blk_mq_run_hw_queue+0x637/0xa00 > [ 414.395986] ? __pfx_lock_acquire+0x10/0x10 > [ 414.396499] blk_mq_sched_dispatch_requests+0x109/0x190 > [ 414.397100] blk_mq_run_hw_queue+0x66e/0xa00 > [ 414.397616] blk_mq_flush_plug_list.part.17+0x614/0x2030 > [ 414.398244] ? __pfx_blk_mq_flush_plug_list.part.17+0x10/0x10 > [ 414.398897] ? writeback_sb_inodes+0x241/0xcc0 > [ 414.399429] blk_mq_flush_plug_list+0x65/0x80 > [ 414.399957] __blk_flush_plug+0x2f1/0x530 > [ 414.400458] ? __pfx___blk_flush_plug+0x10/0x10 > [ 414.400999] blk_finish_plug+0x59/0xa0 > [ 414.401467] wb_writeback+0x7cc/0x920 > [ 414.401935] ? __pfx_wb_writeback+0x10/0x10 > [ 414.402442] ? mark_held_locks+0xb0/0x110 > [ 414.402931] ? __pfx_do_raw_spin_lock+0x10/0x10 > [ 414.403462] ? lockdep_hardirqs_on_prepare+0x297/0x3f0 > [ 414.404062] wb_workfn+0x2b3/0xcf0 > [ 414.404500] ? __pfx_wb_workfn+0x10/0x10 > [ 414.404989] process_scheduled_works+0x432/0x13f0 > [ 414.405546] ? __pfx_process_scheduled_works+0x10/0x10 > [ 414.406139] ? do_raw_spin_lock+0x101/0x2a0 > [ 414.406641] ? assign_work+0x19b/0x240 > [ 414.407106] ? lock_is_held_type+0x9d/0x110 > [ 414.407604] worker_thread+0x6f2/0x1160 > [ 414.408075] ? __kthread_parkme+0x62/0x210 > [ 414.408572] ? lockdep_hardirqs_on_prepare+0x297/0x3f0 > [ 414.409168] ? __kthread_parkme+0x13c/0x210 > [ 414.409678] ? __pfx_worker_thread+0x10/0x10 > [ 414.410191] kthread+0x33c/0x440 > [ 414.410602] ? __pfx_kthread+0x10/0x10 > [ 414.411068] ret_from_fork+0x4d/0x80 > [ 414.411526] ? __pfx_kthread+0x10/0x10 > [ 414.411993] ret_from_fork_asm+0x1b/0x30 > [ 414.412489] </TASK> > > When interrupt is turned on while a lock holding by spin_lock_irq it > throws a warning because of potential deadlock. > > blk_mq_prep_dispatch_rq > blk_mq_get_driver_tag > __blk_mq_get_driver_tag > __blk_mq_alloc_driver_tag > blk_mq_tag_busy -> tag is already busy > // failed to get driver tag > blk_mq_mark_tag_wait > spin_lock_irq(&wq->lock) -> lock A (&sbq->ws[i].wait) > __add_wait_queue(wq, wait) -> wait queue active > blk_mq_get_driver_tag > __blk_mq_tag_busy > -> 1) tag must be idle, which means there can't be inflight IO > spin_lock_irq(&tags->lock) -> lock B (hctx->tags) > spin_unlock_irq(&tags->lock) -> unlock B, turn on interrupt accidentally > -> 2) context must be preempt by IO interrupt to trigger deadlock. > > As shown above, the deadlock is not possible in theory, but the warning > still need to be fixed. > > Fix it by using spin_lock_irqsave to get lockB instead of spin_lock_irq. > > Fixes: 4f1731df60f9 ("blk-mq: fix potential io hang by wrong 'wake_batch'") > Signed-off-by: Li Lingfeng <lilingfeng3@xxxxxxxxxx> Reviewed-by: Ming Lei <ming.lei@xxxxxxxxxx> Thanks, Ming