Re: [PATCH] scsi: ufs: Fix deadlocks between power management and error handler

"Asutosh Das (asd)" <quic_asutoshd@xxxxxxxxxxx> · Sun, 18 Sep 2022 20:10:26 -0700

Hello Bart,

On 9/16/2022 11:42 AM, Bart Van Assche wrote:
The following deadlocks have been observed on multiple test setups:

* ufshcd_wl_suspend() is waiting for blk_execute_rq() to complete while it
   holds host_sem.
* ufshcd_eh_host_reset_handler() invokes ufshcd_err_handler() and the
   latter function tries to obtain host_sem.
This is a deadlock because blk_execute_rq() can't execute SCSI commands
while the host is in the SHOST_RECOVERY state and because the error
handler cannot make progress either.

* ufshcd_wl_runtime_resume() is waiting for blk_execute_rq() to finish
   while it holds host_sem.
* ufshcd_eh_host_reset_handler() invokes ufshcd_err_handler() and the
   latter function calls pm_runtime_resume().
This is a deadlock because of the same reason as the previous scenario.

Fix both deadlocks by not obtaining host_sem from the power management
code paths. Removing the host_sem locking from the power management code
is safe because the ufshcd_err_handler() is already serialized against
SCSI command execution.

Say, there's a PWR_FATAL error in ufshcd_wl_suspend().
Wouldn't there be a scenario in which the suspend and error handler may 
run simultaneously?
Do you see issues when that happens? How about when shutdown runs 
simulataneously with error handler?

-asd