Hello Bart,
On 9/16/2022 11:42 AM, Bart Van Assche wrote:
The following deadlocks have been observed on multiple test setups:
* ufshcd_wl_suspend() is waiting for blk_execute_rq() to complete while it
holds host_sem.
* ufshcd_eh_host_reset_handler() invokes ufshcd_err_handler() and the
latter function tries to obtain host_sem.
This is a deadlock because blk_execute_rq() can't execute SCSI commands
while the host is in the SHOST_RECOVERY state and because the error
handler cannot make progress either.
* ufshcd_wl_runtime_resume() is waiting for blk_execute_rq() to finish
while it holds host_sem.
* ufshcd_eh_host_reset_handler() invokes ufshcd_err_handler() and the
latter function calls pm_runtime_resume().
This is a deadlock because of the same reason as the previous scenario.
Fix both deadlocks by not obtaining host_sem from the power management
code paths. Removing the host_sem locking from the power management code
is safe because the ufshcd_err_handler() is already serialized against
SCSI command execution.
Say, there's a PWR_FATAL error in ufshcd_wl_suspend().
Wouldn't there be a scenario in which the suspend and error handler may
run simultaneously?
Do you see issues when that happens? How about when shutdown runs
simulataneously with error handler?
-asd