Re: [Regression] [PCI/ASPM] [ASUS PN51] Reboot on resume attempt (bisect done; commit found)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 25, 2023 at 07:29:02PM +0100, Michael Schaller wrote:
> Issue:
> On resume from suspend to RAM there is no output for about 12 seconds, then
> shortly a blinking cursor is visible in the upper left corner on an
> otherwise black screen which is followed by a reboot.
> 
> Setup:
> * Machine: ASUS mini PC PN51-BB757MDE1 (DMI model: MINIPC PN51-E1)
> * Firmware: 0508 (latest; also tested previous 0505)
> * OS: Ubuntu 23.10 (except kernel)
> * Kernel: 6.6.8 (also tested 6.7-rc7; config attached)
> 
> Debugging summary:
> * Kernel 5.10.205 isn’t affected.
> * Bisect identified commit 08d0cc5f34265d1a1e3031f319f594bd1970976c as
> cause.
> * PCI device 0000:03:00.0 (Intel 8265 Wifi) causes resume issues as long as
> ASPM is enabled (default).
> * The commit message indicates that a quirk could be written to mitigate the
> issue but I don’t know how to write such a quirk.
> 
> Confirmed workarounds:
> * Connect a USB flash drive (no clue why; maybe this causes a delay that
> lets the resume succeed)
> * Revert commit 08d0cc5f34265d1a1e3031f319f594bd1970976c (commit seemed
> intentional; a quirk seems to be the preferred solution)
> * pcie_aspm=off
> * pcie_aspm.policy=performance
> * echo 0 | sudo tee /sys/bus/pci/devices/0000:03:00.0/link/l1_aspm
> 
> Debugging details:
> * The resume trigger (power button, keyboard, mouse) doesn’t seem to make
> any difference.
> * Double checked that the kernel is configured to *not* reboot on panic.
> * Double checked that there still isn't any kernel output without quiet and
> splash.
> * The issue doesn’t happen if a USB flash drive is connected. The content of
> the flash drive doesn’t appear to matter. The USB port doesn’t appear to
> matter.
> * No information in any logs after the reboot. I suspect the resume from
> suspend to RAM isn’t getting far enough as that logs could be written.
> * Kernel 5.10.205 isn’t affected. Kernel 5.15.145, 6.6.8 and 6.7-rc7 are
> affected.
> * A kernel bisect has revealed the following commit as cause:
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=08d0cc5f34265d1a1e3031f319f594bd1970976c
> * The commit was part of kernel 5.20 and has been backported to 5.15.
> * The commit mentions that a device-specific quirk could be added in case of
> new issues.
> * According to sysfs and lspci only device 0000:03:00.0 (Intel 8265 Wifi)
> has ASPM enabled by default.
> * Disabling ASPM for device 0000:03:00.0 lets the resume from suspend to RAM
> succeed.
> * Enabling ASPM for all devices except 0000:03:00.0 lets the resume from
> suspend to RAM succeed.
> * This would indicate that a quirk is missing for the device 0000:03:00.0
> (Intel 8265 Wifi) but I have no clue how to write such a quirk or how to get
> the specifics for such a quirk.
> * I still have no clue how a USB flash drive plays into all this. Maybe some
> kind of a timing issue where the connected USB flash drive delays something
> long enough so that the resume succeeds. Maybe the code removed by commit
> 08d0cc5f34265d1a1e3031f319f594bd1970976c caused a similar delay. ¯\_(ツ)_/¯

Hmmm.  08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()")
appeared in v6.0, released Oct 2, 2022, so it's been there a while.

But I think the best option is to revert it until this issue is
resolved.  Per the commit log, 08d0cc5f3426 solved two problems:

  1) ASPM config changes done via sysfs are lost if the device power
     state is changed, e.g., typically set to D3hot in .suspend() and
     D0 in .resume().

  2) If L1SS is restored during system resume, that restored state
     would be overwritten.

Problem 2) relates to a patch that is currently reverted (a7152be79b62
("Revert "PCI/ASPM: Save L1 PM Substates Capability for
suspend/resume""), so I don't think reverting 08d0cc5f3426 will make
this problem worse.

Reverting 08d0cc5f3426 will make 1) a problem again.  But my guess is
ASPM changes via sysfs are fairly unusual and the device probably
remains functional even though it may use more power because the ASPM
configuration was lost.

So unless somebody has a counter-argument, I plan to queue a revert of
08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") for
v6.7.

Bjorn




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux