hello, linux block team:
a test running suspend/resume, as well some io test where we hit suspend
on blk_mq_freeze_queue_wait for more than 20s
where we observed that blk_mq_freeze_queue_wait wait for
q_usage_counter put by as userspace task in TASK_FROZEN state.
kworker/u17:3 task B ['TASK_UNINTERRUPTIBLE']
[<ffffffdc0c527f10>] __switch_to+0x1e8
[<ffffffdc0c5287ec>] __schedule+0x6cc
[<ffffffdc0c528c28>] schedule+0x78
[<ffffffdc0bb4b828>] blk_mq_freeze_queue_wait[jt]+0x60
[<ffffffdc0bb4ba10>] blk_freeze_queue+0x70
[<ffffffdc0bb4ba34>] blk_mq_freeze_queue+0x10
[<ffffffdc0be4ba00>] scsi_device_quiesce+0x4c
[<ffffffdc0be55f80>] scsi_bus_suspend+0x48
[<ffffffdc0bdea798>] dpm_run_callback+0x9c
[<ffffffdc0bdec62c>] __device_suspend+0x404
[<ffffffdc0bdec0ac>] async_suspend+0x30
[<ffffffdc0b4f301c>] async_run_entry_fn+0x4c
[<ffffffdc0b4e5034>] process_one_work+0x254
[<ffffffdc0b4e5828>] worker_thread+0x274
[<ffffffdc0b4ecdfc>] kthread+0x110
[<ffffffdc0b4168ec>] ret_from_fork+0x10
[ND:0x0::0xFFFFFF881DAF39A0] q_usage_counter = (
[ND:0x0::0xFFFFFF881DAF39A0] percpu_count_ptr = 0x0000004E8F857E9B, //
__PERCPU_REF_DEAD is set , PERCPU_REF_INIT_ATOMIC is set
[ND:0x0::0xFFFFFF881DAF39A8] data = 0xFFFFFF8818EC9F80 -> (
[ND:0x0::0xFFFFFF8818EC9F80] count = (
[ND:0x0::0xFFFFFF8818EC9F80] counter = 0x3),
q_usage_counter is 3,
mq_freeze_depth = 0x1
since userspace task A in TASK_FROZEN state, it called
percpu_ref_get(&this_hctx->queue->q_usage_counter); from
blk_mq_dispatch_plug_list, but yet to
percpu_ref_put(&this_hctx->queue->q_usage_counter)
so here this looks deadlock ?
can you help to give some debug suggest ?
userspace task A ['TASK_FROZEN']
[<ffffffdc0c527f10>] __switch_to+0x1e8
[<ffffffdc0c5287ec>] __schedule+0x6cc
[<ffffffdc0c528c28>] schedule+0x78
[<ffffffdc0c532bbc>] schedule_timeout+0x50
[<ffffffdc0c529e48>] do_wait_for_common+0x10c
[<ffffffdc0c529238>] wait_for_completion+0x48
[<ffffffdc0b4def4c>] __flush_work+0xcc
[<ffffffdc0b4dee70>] flush_work+0x14
[<ffffffdc0c091aec>] ufshcd_hold+0xc0
[<ffffffdc0c0a3bdc>] ufshcd_queuecommand+0xc0
[<ffffffdc0be4d2bc>] scsi_queue_rq+0x89c
[<ffffffdc0bb4f6a0>] blk_mq_dispatch_rq_list+0x3c8
[<ffffffdc0bb59218>] __blk_mq_sched_dispatch_requests+0x430
[<ffffffdc0bb58db4>] blk_mq_sched_dispatch_requests+0x38
[<ffffffdc0bb4e69c>] blk_mq_run_hw_queue+0x258
[<ffffffdc0bb503a0>] blk_mq_flush_plug_list+0xc4
[<ffffffdc0bb425e4>] __blk_flush_plug+0x118
[<ffffffdc0bb42658>] blk_finish_plug+0x28
[<ffffffdc0b6fc7a0>] read_pages+0x31c
[<ffffffdc0b6fc318>] page_cache_ra_unbounded+0x88
[<ffffffdc0b6fcb9c>] page_cache_ra_order+0x270
[<ffffffdc0b6ef380>] do_sync_mmap_readahead+0xd0
[<ffffffdc0b6eeea8>] filemap_fault+0x14c
[<ffffffdc0b748eb4>] handle_mm_fault+0x4fc
[<ffffffdc0c536d40>] do_page_fault+0x294
[<ffffffdc0c536a90>] do_translation_fault[jt]+0x40
[<ffffffdc0b43fae4>] do_mem_abort+0x58
[<ffffffdc0c51ed70>] el0_da+0x48
[<ffffffdc0c51ec90>] el0t_64_sync_handler[jt]+0xb0
[<ffffffdc0b411588>] ret_to_user[jt]+0x0
BR
Kassey