Re: Question about deadlock between AER and pceihp interrupts during resume from S3 with unplugged device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 09, 2022 at 02:54:06PM -0500, Andrey Grodzovsky wrote:
> Hi, on kernel based on 5.4.2 we are observing a deadlock between
> reset_lock semaphore and device_lock (dev->mutex). The scenario
> we do is putting the system to sleep, disconnecting the eGPU
> from the PCIe bus (through a special SBIOS setting) or by simply
> removing power to external PCIe cage and waking the
> system up.
> 
> I attached the log. Please advise if you have any idea how
> to work around it ? Since the kernel is old, does anyone
> have an idea if this issue is known and already solved in later kernels ?
> We cannot try with latest since our kernel is custom for that platform.

It is a known issue.  Here's a fix I submitted during the v5.9 cycle:

https://lore.kernel.org/linux-pci/908047f7699d9de9ec2efd6b79aa752d73dab4b6.1595329748.git.lukas@xxxxxxxxx/

The fix hasn't been applied yet.  I think I need to rework the patch,
just haven't found the time.

Since the trigger in your case are AER-handled errors during a
system sleep transition, you may also want to consider the
following 2-patch series by Kai-Heng Feng which is currently
under discussion:

https://lore.kernel.org/linux-pci/20220127025418.1989642-1-kai.heng.feng@xxxxxxxxxxxxx/

That series disables AER during a system sleep transition and
should thus prevent the flood of AER-handled errors you're seeing.
Once AER is disabled, the reset-induced deadlocks should go away as well.

Thanks,

Lukas



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux