On Wed, 8 Jan 2025 at 14:39, Manivannan Sadhasivam via B4 Relay <devnull+manivannan.sadhasivam.linaro.org@xxxxxxxxxx> wrote: > > From: Manivannan Sadhasivam <manivannan.sadhasivam@xxxxxxxxxx> > > There are multiple places from where the recovery work gets scheduled > asynchronously. Also, there are multiple places where the caller waits > synchronously for the recovery to be completed. One such place is during > the PM shutdown() callback. > > If the device is not alive during recovery_work, it will try to reset the > device using pci_reset_function(). This function internally will take the > device_lock() first before resetting the device. By this time, if the lock > has already been acquired, then recovery_work will get stalled while > waiting for the lock. And if the lock was already acquired by the caller > which waits for the recovery_work to be completed, it will lead to > deadlock. > > This is what happened on the X1E80100 CRD device when the device died > before shutdown() callback. Driver core calls the driver's shutdown() > callback while holding the device_lock() leading to deadlock. > > And this deadlock scenario can occur on other paths as well, like during > the PM suspend() callback, where the driver core would hold the > device_lock() before calling driver's suspend() callback. And if the > recovery_work was already started, it could lead to deadlock. This is also > observed on the X1E80100 CRD. > > So to fix both issues, use pci_try_reset_function() in recovery_work. This > function first checks for the availability of the device_lock() before > trying to reset the device. If the lock is available, it will acquire it > and reset the device. Otherwise, it will return -EAGAIN. If that happens, > recovery_work will fail with the error message "Recovery failed" as not > much could be done. > > Cc: stable@xxxxxxxxxxxxxxx # 5.12 > Reported-by: Johan Hovold <johan@xxxxxxxxxx> > Closes: https://lore.kernel.org/mhi/Z1me8iaK7cwgjL92@xxxxxxxxxxxxxxxxxxxx > Fixes: 7389337f0a78 ("mhi: pci_generic: Add suspend/resume/recovery procedure") > Signed-off-by: Manivannan Sadhasivam <manivannan.sadhasivam@xxxxxxxxxx> Reviewed-by: Loic Poulain <loic.poulain@xxxxxxxxxx>