On 16/09/22 21:42, Bart Van Assche wrote: > The following deadlocks have been observed on multiple test setups: > > * ufshcd_wl_suspend() is waiting for blk_execute_rq() to complete while it > holds host_sem. > * ufshcd_eh_host_reset_handler() invokes ufshcd_err_handler() and the > latter function tries to obtain host_sem. > This is a deadlock because blk_execute_rq() can't execute SCSI commands > while the host is in the SHOST_RECOVERY state and because the error > handler cannot make progress either. Hi Bart Did you consider something like: diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index 7256e6c43ca6..dc83b38dfde9 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -7374,6 +7374,9 @@ static int ufshcd_eh_host_reset_handler(struct scsi_cmnd *cmd) hba = shost_priv(cmd->device->host); + if (hba->pm_op_in_progress) + return FAST_IO_FAIL; + spin_lock_irqsave(hba->host->host_lock, flags); hba->force_reset = true; ufshcd_schedule_eh_work(hba); > > * ufshcd_wl_runtime_resume() is waiting for blk_execute_rq() to finish > while it holds host_sem. > * ufshcd_eh_host_reset_handler() invokes ufshcd_err_handler() and the > latter function calls pm_runtime_resume(). > This is a deadlock because of the same reason as the previous scenario. > > Fix both deadlocks by not obtaining host_sem from the power management > code paths. Removing the host_sem locking from the power management code > is safe because the ufshcd_err_handler() is already serialized against > SCSI command execution. The original commit for host_sem was aimed at sysfs (see commit below). Did you consider how sysfs access is affected? commit 9cd20d3f473619d8d482551d15d4cebfb3ce73c8 Author: Can Guo <cang@xxxxxxxxxxxxxx> Date: Wed Jan 13 19:13:28 2021 -0800 scsi: ufs: Protect PM ops and err_handler from user access through sysfs User layer may access sysfs nodes when system PM ops or error handling is running. This can cause various problems. Rename eh_sem to host_sem and use it to protect PM ops and error handling from user layer intervention. > > The ufshcd_rpm_get_sync() call at the start of > ufshcd_err_handling_prepare() may deadlock since calling scsi_execute() > is required by the UFS runtime resume implementation. Fixing that > deadlock falls outside the scope of this patch. Do you mean: static void ufshcd_err_handling_prepare(struct ufs_hba *hba) { ufshcd_rpm_get_sync(hba); because that is the host controller, not the UFS device, that is being resumed. > > Cc: dh0421.hwang@xxxxxxxxxxx > Cc: Asutosh Das <asutoshd@xxxxxxxxxxxxxx> > Fixes: b294ff3e3449 ("scsi: ufs: core: Enable power management for wlun") > Signed-off-by: Bart Van Assche <bvanassche@xxxxxxx> > --- > drivers/ufs/core/ufshcd.c | 6 +----- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c > index 7c15cbc737b4..cd3c2aa981c6 100644 > --- a/drivers/ufs/core/ufshcd.c > +++ b/drivers/ufs/core/ufshcd.c > @@ -9254,16 +9254,13 @@ static int ufshcd_wl_suspend(struct device *dev) > ktime_t start = ktime_get(); > > hba = shost_priv(sdev->host); > - down(&hba->host_sem); > > if (pm_runtime_suspended(dev)) > goto out; > > ret = __ufshcd_wl_suspend(hba, UFS_SYSTEM_PM); > - if (ret) { > + if (ret) > dev_err(&sdev->sdev_gendev, "%s failed: %d\n", __func__, ret); > - up(&hba->host_sem); > - } > > out: > if (!ret) > @@ -9296,7 +9293,6 @@ static int ufshcd_wl_resume(struct device *dev) > hba->curr_dev_pwr_mode, hba->uic_link_state); > if (!ret) > hba->is_sys_suspended = false; > - up(&hba->host_sem); > return ret; > } > #endif