Re: mhi resume failure on reboot with 6.13-rc2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 19, 2024 at 09:36:32AM +0100, Johan Hovold wrote:
> On Thu, Dec 19, 2024 at 12:05:55AM +0530, Manivannan Sadhasivam wrote:
> > On Wed, Dec 18, 2024 at 03:26:38PM +0100, Johan Hovold wrote:
> > > On Wed, Dec 18, 2024 at 07:39:10PM +0530, Manivannan Sadhasivam wrote:
> > > > On Wed, Dec 18, 2024 at 02:55:02PM +0100, Johan Hovold wrote:
> 
> > > > > But that's not going to happen as that reset is what is currently
> > > > > causing the deadlock and which would simply be skipped if you switch to
> > > > > pci_try_reset_function().
> > > > > 
> > > > 
> > > > mhi_pci_runtime_resume() will queue the recovery_work() and return. So I was
> > > > hoping that by the time pci_try_reset_function() is called, the lock would be
> > > > available.
> > > 
> > > We can't rely on luck with timings, and this is the very reason for the
> > > deadlock I'm currently seeing (i.e. the recovery thread is still running
> > > when another thread grabs the lock and waits for the recovery thread to
> > > finish).
> > > 
> > > Perhaps the recovery work should be done synchronously in the resume
> > > handler to avoid such issues.
> > 
> > Synchronously? How can that help when the recovery_work() cannot acquire the
> > lock?
> 
> During system suspend, pm core waits for any on-going runtime resume
> operations to complete before taking the device lock and suspending the
> device.
> 

Right, but mhi_pci_runtime_resume() is also called from mhi_pci_resume(). So we
cannot safely carry out the recovery_work() synchronously without the
pci_try_reset_function() change.

> Unfortunately, that's currently not the case during shutdown() where
> those operations are reversed, so that would indeed need to be addressed
> first.
> 
> But what the driver is currently doing looks highly questionable as it
> returns success when it failed to resume the device (after scheduling
> the asynchronous recovery work).
> 

I completely agree and this goes against what PM core expects. IMO we need
two fixes, one uses pci_try_reset_function() and another recovers the device
synchronously from mhi_pci_runtime_resume() and passes the return value to PM
core.

Will post the patches.

- Mani

-- 
மணிவண்ணன் சதாசிவம்




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [Linux for Sparc]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux