Re: [PATCH v4 06/10] scsi: ufs: Remove host_sem used in suspend/resume

Can Guo <cang@xxxxxxxxxxxxxx> · Thu, 24 Jun 2021 14:12:31 +0800

On 2021-06-24 13:52, Adrian Hunter wrote:
On 24/06/21 5:16 am, Can Guo wrote:
On 2021-06-23 22:30, Adrian Hunter wrote:
On 23/06/21 10:35 am, Can Guo wrote:
To protect system suspend/resume from being disturbed by error 
handling,
instead of using host_sem, let error handler call 
lock_system_sleep() and
unlock_system_sleep() which achieve the same purpose. Remove the 
host_sem
used in suspend/resume paths to make the code more readable.

Suggested-by: Bart Van Assche <bvanassche@xxxxxxx>
Signed-off-by: Can Guo <cang@xxxxxxxxxxxxxx>
---
 drivers/scsi/ufs/ufshcd.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 3695dd2..a09e4a2 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -5907,6 +5907,11 @@ static void ufshcd_clk_scaling_suspend(struct 
ufs_hba *hba, bool suspend)

 static void ufshcd_err_handling_prepare(struct ufs_hba *hba)
 {
+    /*
+     * It is not safe to perform error handling while suspend or 
resume is
+     * in progress. Hence the lock_system_sleep() call.
+     */
+    lock_system_sleep();

It looks to me like the system takes this lock quite early, even 
before
freezing tasks, so if anything needs the error handler to run it will
deadlock.

Hi Adrian,

UFS/hba system suspend/resume does not invoke or call error handling 
in a
synchronous way. So, whatever UFS errors (which schedules the error 
handler)
happens during suspend/resume, error handler will just wait here till 
system
suspend/resume release the lock. Hence no worries of deadlock here.

It looks to me like the state can change to 
UFSHCD_STATE_EH_SCHEDULED_FATAL
and since user processes are not frozen, nor file systems sync'ed, 
everything
is going to deadlock.
i.e.
I/O is blocked waiting on error handling
error handling is blocked waiting on lock_system_sleep()
suspend is blocked waiting on I/O


Hi Adrian,

First of all, enter_state(suspend_state_t state) uses 
mutex_trylock(&system_transition_mutex).
Second, even that happens, in ufshcd_queuecommand(), below logic will 
break the cycle, by
fast failing the PM request (below codes are from the code tip with this 
whole series applied).

        case UFSHCD_STATE_EH_SCHEDULED_FATAL:
                /*
                 * ufshcd_rpm_get_sync() is used at error handling 
preparation
                 * stage. If a scsi cmd, e.g., the SSU cmd, is sent from 
the
                 * PM ops, it can never be finished if we let SCSI layer 
keep
                 * retrying it, which gets err handler stuck forever. 
Neither
                 * can we let the scsi cmd pass through, because UFS is 
in bad
                 * state, the scsi cmd may eventually time out, which 
will get
                 * err handler blocked for too long. So, just fail the 
scsi cmd
                 * sent from PM ops, err handler can recover PM error 
anyways.
                 */
                if (cmd->request->rq_flags & RQF_PM) {
                        hba->force_reset = true;
                        set_host_byte(cmd, DID_BAD_TARGET);
                        cmd->scsi_done(cmd);
                        goto out;
                }
                fallthrough;
        case UFSHCD_STATE_RESET:

Thanks,

Can Guo.


Thanks,

Can Guo.


     ufshcd_rpm_get_sync(hba);
     if 
(pm_runtime_status_suspended(&hba->sdev_ufs_device->sdev_gendev) ||
         hba->is_wlu_sys_suspended) {
@@ -5951,6 +5956,7 @@ static void 
ufshcd_err_handling_unprepare(struct ufs_hba *hba)
         ufshcd_clk_scaling_suspend(hba, false);
     ufshcd_clear_ua_wluns(hba);
     ufshcd_rpm_put(hba);
+    unlock_system_sleep();
 }

 static inline bool ufshcd_err_handling_should_stop(struct ufs_hba 
*hba)
@@ -9053,16 +9059,13 @@ static int ufshcd_wl_suspend(struct device 
*dev)
     ktime_t start = ktime_get();

     hba = shost_priv(sdev->host);
-    down(&hba->host_sem);

     if (pm_runtime_suspended(dev))
         goto out;

     ret = __ufshcd_wl_suspend(hba, UFS_SYSTEM_PM);
-    if (ret) {
+    if (ret)
         dev_err(&sdev->sdev_gendev, "%s failed: %d\n", __func__,  
ret);
-        up(&hba->host_sem);
-    }

 out:
     if (!ret)
@@ -9095,7 +9098,6 @@ static int ufshcd_wl_resume(struct device 
*dev)
         hba->curr_dev_pwr_mode, hba->uic_link_state);
     if (!ret)
         hba->is_wlu_sys_suspended = false;
-    up(&hba->host_sem);
     return ret;
 }
 #endif