Re: [PATCH 5/7] scsi: ufs: core: Simplify ufshcd_err_handling_prepare()

Peter Wang (王信友) <peter.wang@xxxxxxxxxxxx> · Tue, 22 Oct 2024 02:38:17 +0000

On Mon, 2024-10-21 at 13:41 -0700, Bart Van Assche wrote:
>  	 
> External email : Please do not click links or open attachments until
> you have verified the sender or the content.
>  On 10/21/24 2:43 AM, Peter Wang (王信友) wrote:
> > Using blk_mq_quiesce_tagset instead of ufshcd_scsi_block_requests
> > could cause issues. After the patch below was merged, Mediatek
> > received three cases of IO hang.
> > 77691af484e2 ("scsi: ufs: core: Quiesce request queues before
> checking
> > pending cmds")
> > I think this patch might need to be reverted first.
> > 
> > Here is backtrace of IO hang.
> > ppid=3952 pid=3952 D cpu=6 prio=120 wait=188s kworker/u16:0
> > vmlinux __synchronize_srcu() + 216
> > </proc/self/cwd/common/kernel/rcu/srcutree.c:1386>
> > vmlinux synchronize_srcu() + 276
> > </proc/self/cwd/common/kernel/rcu/srcutree.c:0>
> > vmlinux blk_mq_wait_quiesce_done() + 20
> > </proc/self/cwd/common/block/blk-mq.c:226>
> > vmlinux blk_mq_quiesce_tagset() + 156
> > </proc/self/cwd/common/block/blk-mq.c:286>
> > vmlinux ufshcd_clock_scaling_prepare(timeout_us=1000000) + 16
> > </proc/self/cwd/common/drivers/ufs/core/ufshcd.c:1276>
> > vmlinux ufshcd_devfreq_scale() + 52
> > </proc/self/cwd/common/drivers/ufs/core/ufshcd.c:1322>
> > vmlinux ufshcd_devfreq_target() + 384
> > </proc/self/cwd/common/drivers/ufs/core/ufshcd.c:1440>
> > vmlinux devfreq_set_target(flags=0) + 184
> > </proc/self/cwd/common/drivers/devfreq/devfreq.c:363>
> > vmlinux devfreq_update_target(freq=0) + 296
> > </proc/self/cwd/common/drivers/devfreq/devfreq.c:429>
> > vmlinux update_devfreq() + 8
> > </proc/self/cwd/common/drivers/devfreq/devfreq.c:444>
> > vmlinux devfreq_monitor() + 48
> > </proc/self/cwd/common/drivers/devfreq/devfreq.c:460>
> > vmlinux process_one_work() + 476
> > </proc/self/cwd/common/kernel/workqueue.c:2643>
> > vmlinux process_scheduled_works() + 580
> > </proc/self/cwd/common/kernel/workqueue.c:2717>
> > vmlinux worker_thread() + 576
> > </proc/self/cwd/common/kernel/workqueue.c:2798>
> > vmlinux kthread() + 272
> > </proc/self/cwd/common/kernel/kthread.c:388>
> > vmlinux 0xFFFFFFE239A164EC()
> > </proc/self/cwd/common/arch/arm64/kernel/entry.S:846>
> 
> Hi Peter,
> 
> Thank you very much for having reported this hang early. Would it be
> possible for you to test the patch below on top of this patch series?
> I think the root cause of the hang that you reported is in the block
> layer.
> 
> Thanks,
> 
> Bart.
> 
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 7b02188feed5..7482e682deca 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -283,8 +283,9 @@ void blk_mq_quiesce_tagset(struct blk_mq_tag_set
> *set)
>   if (!blk_queue_skip_tagset_quiesce(q))
>   blk_mq_quiesce_queue_nowait(q);
>   }
> -blk_mq_wait_quiesce_done(set);
>   mutex_unlock(&set->tag_list_lock);
> +
> +blk_mq_wait_quiesce_done(set);
>   }
>   EXPORT_SYMBOL_GPL(blk_mq_quiesce_tagset);
> 
> 

Hi Bart,

We can test this patch, but because the low probability of the 
issue reproduce rate, I'm concerned that not encountering it 
during testing doesn't necessarily mean the problem has been 
truly resolved. 
However, from my understanding, this patch should be able to 
resolve the deadlock.

Thanks
Peter