Re: [PATCH v3 16/18] scsi: ufs: Synchronize SCSI and UFS error handling

Bart Van Assche <bvanassche@xxxxxxx> · Sun, 29 Aug 2021 15:18:30 -0700

On 8/28/21 02:47, Adrian Hunter wrote:
There is a deadlock that seems to be related to this patch, because now
requests are blocked while the error handler waits on the host_sem.

Hi Adrian,

Some but not all of the issues mentioned below have been introduced by 
patch 16/18. Anyway, thank you for having shared your concerns.

Example:

ufshcd_err_handler() races with ufshcd_wl_suspend() for host_sem.
ufshcd_wl_suspend() wins the race but now PM requests deadlock:

because:
  scsi_queue_rq() -> scsi_host_queue_ready() -> scsi_host_in_recovery() is FALSE

because:
  scsi_schedule_eh() has done:
	    scsi_host_set_state(shost, SHOST_RECOVERY) == 0 ||
	    scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY) == 0)

Some questions for thought:

Won't any holder of host_sem deadlock if it tries to do SCSI requests
and the error handler is waiting on host_sem?

Won't runtime resume deadlock if it is initiated by the error handler?

My understanding is that host_sem is used for the following purposes:
- To prevent that sysfs attributes are read or written after shutdown
  has started (hba->shutting_down).
- To serialize sysfs attribute access, clock scaling, error handling,
  the ufshcd_probe_hba() call from ufshcd_async_scan() and hibernation.

I propose to make the following changes:
- Instead of checking the value of hba->shutting_down from inside sysfs
  attribute callbacks, remove sysfs attributes before starting shutdown.
  That will remove the need to check hba->shutting_down from inside
  sysfs attribute callbacks.
- Leave out the host_sem down() and up() calls from ufshcd_wl_suspend()
  and ufshcd_wl_resume(). Serializing hibernation against e.g. sysfs
  attribute access is not the responsibility of a SCSI LLD - this is the
  responsibility of the power management core.
- Split host_sem. I don't see how else to address the potential deadlock
  between the error handler and runtime resume.

Do you agree with the above?

Thanks,

Bart.