Re: [PATCH v3] ufs: core: wlun resume SSU(Acitve) fail recovery

Bart Van Assche <bvanassche@xxxxxxx> · Mon, 2 Jan 2023 16:29:26 -0800

On 12/27/22 22:01, peter.wang@xxxxxxxxxxxx wrote:
When wlun resume SSU(Active) timeout, scsi try eh_host_reset_handler.
                   ^^^^^^^^^^^
Please use the same spelling in the patch subject (Acitve -> Active).

timeout -> times out
scsi try -> the SCSI core invokes

But ufshcd_eh_host_reset_handler hang at wait flush_work(&hba->eh_work).

hang at -> hangs in

And ufshcd_err_handler hang at wait rpm resume.

hang at wait rpm resume -> hangs in rpm_resume().

> <ffffffdd78e02b34> schedule+0x110/0x204
> <ffffffdd78e0be60> schedule_timeout+0x98/0x138
> <ffffffdd78e040e8> wait_for_common_io+0x130/0x2d0
> <ffffffdd77d6a000> blk_execute_rq+0x10c/0x16c
> <ffffffdd78126d90> __scsi_execute+0xfc/0x278
> <ffffffdd7813891c> ufshcd_set_dev_pwr_mode+0x1c8/0x40c
> <ffffffdd78137d1c> __ufshcd_wl_resume+0xf0/0x5cc
> <ffffffdd78137ae0> ufshcd_wl_runtime_resume+0x40/0x18c
> <ffffffdd78136108> scsi_runtime_resume+0x88/0x104
> <ffffffdd7809a4f8> __rpm_callback+0x1a0/0xaec
> <ffffffdd7809b624> rpm_resume+0x7e0/0xcd0
> <ffffffdd7809a788> __rpm_callback+0x430/0xaec
> <ffffffdd7809b644> rpm_resume+0x800/0xcd0
> <ffffffdd780a0778> pm_runtime_work+0x148/0x198
>
> <ffffffdd78e02b34> schedule+0x110/0x204
> <ffffffdd78e0be10> schedule_timeout+0x48/0x138
> <ffffffdd78e03d9c> wait_for_common+0x144/0x2dc
> <ffffffdd7758bba4> __flush_work+0x3d0/0x508
> <ffffffdd7815572c> ufshcd_eh_host_reset_handler+0x134/0x3a8
> <ffffffdd781216f4> scsi_try_host_reset+0x54/0x204
> <ffffffdd78120594> scsi_eh_ready_devs+0xb30/0xd48
> <ffffffdd7812373c> scsi_error_handler+0x260/0x874
>
> <ffffffdd78e02b34> schedule+0x110/0x204
> <ffffffdd7809af64> rpm_resume+0x120/0xcd0
> <ffffffdd7809fde8> __pm_runtime_resume+0xa0/0x17c
> <ffffffdd7815193c> ufshcd_err_handling_prepare+0x40/0x430
> <ffffffdd7814cce8> ufshcd_err_handler+0x1c4/0xd4c

On top of which kernel version has this patch been developed?
I think this deadlock has already been fixed by commit 7029e2151a7c 
("scsi: ufs: Fix a deadlock between PM and the SCSI error handler").
Please check whether that commit by itself (without this patch) is 
sufficient to fix the reported deadlock.

---
  drivers/ufs/core/ufshcd.c | 17 +++++++++++++++++
  1 file changed, 17 insertions(+)

The changelog is missing. Please include a changelog when posting v2 or 
a later version of a patch.

diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c
index e18c9f4463ec..0dfb9a35bf66 100644
--- a/drivers/ufs/core/ufshcd.c
+++ b/drivers/ufs/core/ufshcd.c
@@ -7366,6 +7366,23 @@ static int ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd)
  
  	hba = shost_priv(cmd->device->host);
  
+	/*
+	 * If pm op resume fail and wait err recovery, do link recovery only.
+	 * Because schedule eh work will get dead lock in ufshcd_rpm_get_sync
+	 * and wait wlun resume, but wlun resume error wait eh work finish.
+	 */

The above comment has grammar issues and some parts are 
incomprehensible. What does e.g. "wait err recovery" mean? Please 
improve this source code comment.

Thanks,

Bart.