Re: [PATCH v1] ufs: core: bypass get rpm when err handling with pm_op_in_progress

Peter Wang <peter.wang@xxxxxxxxxxxx> · Mon, 19 Sep 2022 22:47:59 +0800

On 9/17/22 05:39, Bart Van Assche wrote:
On 9/15/22 04:58, peter.wang@xxxxxxxxxxxx wrote:
-static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
+static void ufshcd_err_handling_prepare(struct ufs_hba *hba, bool 
*rpm_put)
  {
-    ufshcd_rpm_get_sync(hba);
+    if (!hba->pm_op_in_progress) {
+        ufshcd_rpm_get_sync(hba);
+        *rpm_put = true;
+    }
+

Hi Peter,

I don't think that this patch is sufficient. If 
ufshcd_err_handling_prepare() is called by the host reset handler 
(ufshcd_eh_host_reset_handler()) then the host state will be 
SHOST_RECOVERY. In that state SCSI command submission will hang and 
hence any ufshcd_rpm_get_sync() call will hang.

How about removing the ufshcd_rpm_get_sync() call from 
ufshcd_err_handling_prepare() and the ufshcd_rpm_put() call from 
ufshcd_err_handling_unprepare()? It is guaranteed that no commands are 
in progress for a runtime suspended LUN so the code for aborting 
pending requests in the UFS error handler will be skipped anyway if it 
is invoked for a runtime suspended device.

Thanks,

Bart.

Hi Bart,

If the scsi error happened and need do ufshcd_eh_host_reset_handler, the 
rpm state should in RPM_ACTIVE.
Because scsi need wakeup suspended LUN, and send command to LUN then get 
error, right?
So, ufshcd_rpm_get_sync should not hang in this case.

If remove ufshcd_rpm_get_sync directly, think about this case.
ufshcd_err_handler is on going and try to abort some task (which may get 
stuck and timeout too).
Then rpm count down try to suspend. Finally runtime suspend callback may 
return IO error and IO hang.

Thanks.
Peter