On 24/06/21 5:16 am, Can Guo wrote: > On 2021-06-23 22:30, Adrian Hunter wrote: >> On 23/06/21 10:35 am, Can Guo wrote: >>> To protect system suspend/resume from being disturbed by error handling, >>> instead of using host_sem, let error handler call lock_system_sleep() and >>> unlock_system_sleep() which achieve the same purpose. Remove the host_sem >>> used in suspend/resume paths to make the code more readable. >>> >>> Suggested-by: Bart Van Assche <bvanassche@xxxxxxx> >>> Signed-off-by: Can Guo <cang@xxxxxxxxxxxxxx> >>> --- >>> drivers/scsi/ufs/ufshcd.c | 12 +++++++----- >>> 1 file changed, 7 insertions(+), 5 deletions(-) >>> >>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c >>> index 3695dd2..a09e4a2 100644 >>> --- a/drivers/scsi/ufs/ufshcd.c >>> +++ b/drivers/scsi/ufs/ufshcd.c >>> @@ -5907,6 +5907,11 @@ static void ufshcd_clk_scaling_suspend(struct ufs_hba *hba, bool suspend) >>> >>> static void ufshcd_err_handling_prepare(struct ufs_hba *hba) >>> { >>> + /* >>> + * It is not safe to perform error handling while suspend or resume is >>> + * in progress. Hence the lock_system_sleep() call. >>> + */ >>> + lock_system_sleep(); >> >> It looks to me like the system takes this lock quite early, even before >> freezing tasks, so if anything needs the error handler to run it will >> deadlock. > > Hi Adrian, > > UFS/hba system suspend/resume does not invoke or call error handling in a > synchronous way. So, whatever UFS errors (which schedules the error handler) > happens during suspend/resume, error handler will just wait here till system > suspend/resume release the lock. Hence no worries of deadlock here. It looks to me like the state can change to UFSHCD_STATE_EH_SCHEDULED_FATAL and since user processes are not frozen, nor file systems sync'ed, everything is going to deadlock. i.e. I/O is blocked waiting on error handling error handling is blocked waiting on lock_system_sleep() suspend is blocked waiting on I/O > > Thanks, > > Can Guo. > >> >>> ufshcd_rpm_get_sync(hba); >>> if (pm_runtime_status_suspended(&hba->sdev_ufs_device->sdev_gendev) || >>> hba->is_wlu_sys_suspended) { >>> @@ -5951,6 +5956,7 @@ static void ufshcd_err_handling_unprepare(struct ufs_hba *hba) >>> ufshcd_clk_scaling_suspend(hba, false); >>> ufshcd_clear_ua_wluns(hba); >>> ufshcd_rpm_put(hba); >>> + unlock_system_sleep(); >>> } >>> >>> static inline bool ufshcd_err_handling_should_stop(struct ufs_hba *hba) >>> @@ -9053,16 +9059,13 @@ static int ufshcd_wl_suspend(struct device *dev) >>> ktime_t start = ktime_get(); >>> >>> hba = shost_priv(sdev->host); >>> - down(&hba->host_sem); >>> >>> if (pm_runtime_suspended(dev)) >>> goto out; >>> >>> ret = __ufshcd_wl_suspend(hba, UFS_SYSTEM_PM); >>> - if (ret) { >>> + if (ret) >>> dev_err(&sdev->sdev_gendev, "%s failed: %d\n", __func__, ret); >>> - up(&hba->host_sem); >>> - } >>> >>> out: >>> if (!ret) >>> @@ -9095,7 +9098,6 @@ static int ufshcd_wl_resume(struct device *dev) >>> hba->curr_dev_pwr_mode, hba->uic_link_state); >>> if (!ret) >>> hba->is_wlu_sys_suspended = false; >>> - up(&hba->host_sem); >>> return ret; >>> } >>> #endif >>>