On 8/28/21 02:47, Adrian Hunter wrote:
There is a deadlock that seems to be related to this patch, because now
requests are blocked while the error handler waits on the host_sem.
Hi Adrian,
Some but not all of the issues mentioned below have been introduced by
patch 16/18. Anyway, thank you for having shared your concerns.
Example:
ufshcd_err_handler() races with ufshcd_wl_suspend() for host_sem.
ufshcd_wl_suspend() wins the race but now PM requests deadlock:
because:
scsi_queue_rq() -> scsi_host_queue_ready() -> scsi_host_in_recovery() is FALSE
because:
scsi_schedule_eh() has done:
scsi_host_set_state(shost, SHOST_RECOVERY) == 0 ||
scsi_host_set_state(shost, SHOST_CANCEL_RECOVERY) == 0)
Some questions for thought:
Won't any holder of host_sem deadlock if it tries to do SCSI requests
and the error handler is waiting on host_sem?
Won't runtime resume deadlock if it is initiated by the error handler?
My understanding is that host_sem is used for the following purposes:
- To prevent that sysfs attributes are read or written after shutdown
has started (hba->shutting_down).
- To serialize sysfs attribute access, clock scaling, error handling,
the ufshcd_probe_hba() call from ufshcd_async_scan() and hibernation.
I propose to make the following changes:
- Instead of checking the value of hba->shutting_down from inside sysfs
attribute callbacks, remove sysfs attributes before starting shutdown.
That will remove the need to check hba->shutting_down from inside
sysfs attribute callbacks.
- Leave out the host_sem down() and up() calls from ufshcd_wl_suspend()
and ufshcd_wl_resume(). Serializing hibernation against e.g. sysfs
attribute access is not the responsibility of a SCSI LLD - this is the
responsibility of the power management core.
- Split host_sem. I don't see how else to address the potential deadlock
between the error handler and runtime resume.
Do you agree with the above?
Thanks,
Bart.