Hi Michael, On Tue, Jan 2, 2024 at 2:57 AM Michael Schaller <michael@xxxxxxxxxxx> wrote: > > On 01.01.24 19:13, Bjorn Helgaas wrote: > > On Mon, Dec 25, 2023 at 07:29:02PM +0100, Michael Schaller wrote: > >> Issue: > >> On resume from suspend to RAM there is no output for about 12 seconds, then > >> shortly a blinking cursor is visible in the upper left corner on an > >> otherwise black screen which is followed by a reboot. > >> > >> Setup: > >> * Machine: ASUS mini PC PN51-BB757MDE1 (DMI model: MINIPC PN51-E1) > >> * Firmware: 0508 (latest; also tested previous 0505) > >> * OS: Ubuntu 23.10 (except kernel) > >> * Kernel: 6.6.8 (also tested 6.7-rc7; config attached) > >> > >> Debugging summary: > >> * Kernel 5.10.205 isn’t affected. > >> * Bisect identified commit 08d0cc5f34265d1a1e3031f319f594bd1970976c as > >> cause. > >> * PCI device 0000:03:00.0 (Intel 8265 Wifi) causes resume issues as long as > >> ASPM is enabled (default). > >> * The commit message indicates that a quirk could be written to mitigate the > >> issue but I don’t know how to write such a quirk. > >> > >> Confirmed workarounds: > >> * Connect a USB flash drive (no clue why; maybe this causes a delay that > >> lets the resume succeed) > >> * Revert commit 08d0cc5f34265d1a1e3031f319f594bd1970976c (commit seemed > >> intentional; a quirk seems to be the preferred solution) > >> * pcie_aspm=off > >> * pcie_aspm.policy=performance > >> * echo 0 | sudo tee /sys/bus/pci/devices/0000:03:00.0/link/l1_aspm > >> > >> Debugging details: > >> * The resume trigger (power button, keyboard, mouse) doesn’t seem to make > >> any difference. > >> * Double checked that the kernel is configured to *not* reboot on panic. > >> * Double checked that there still isn't any kernel output without quiet and > >> splash. > >> * The issue doesn’t happen if a USB flash drive is connected. The content of > >> the flash drive doesn’t appear to matter. The USB port doesn’t appear to > >> matter. > >> * No information in any logs after the reboot. I suspect the resume from > >> suspend to RAM isn’t getting far enough as that logs could be written. > >> * Kernel 5.10.205 isn’t affected. Kernel 5.15.145, 6.6.8 and 6.7-rc7 are > >> affected. > >> * A kernel bisect has revealed the following commit as cause: > >> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=08d0cc5f34265d1a1e3031f319f594bd1970976c > >> * The commit was part of kernel 5.20 and has been backported to 5.15. > >> * The commit mentions that a device-specific quirk could be added in case of > >> new issues. > >> * According to sysfs and lspci only device 0000:03:00.0 (Intel 8265 Wifi) > >> has ASPM enabled by default. > >> * Disabling ASPM for device 0000:03:00.0 lets the resume from suspend to RAM > >> succeed. > >> * Enabling ASPM for all devices except 0000:03:00.0 lets the resume from > >> suspend to RAM succeed. > >> * This would indicate that a quirk is missing for the device 0000:03:00.0 > >> (Intel 8265 Wifi) but I have no clue how to write such a quirk or how to get > >> the specifics for such a quirk. > >> * I still have no clue how a USB flash drive plays into all this. Maybe some > >> kind of a timing issue where the connected USB flash drive delays something > >> long enough so that the resume succeeds. Maybe the code removed by commit > >> 08d0cc5f34265d1a1e3031f319f594bd1970976c caused a similar delay. ¯\_(ツ)_/¯ > > > > Hmmm. 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") > > appeared in v6.0, released Oct 2, 2022, so it's been there a while. > > > > But I think the best option is to revert it until this issue is > > resolved. Per the commit log, 08d0cc5f3426 solved two problems: > > > > 1) ASPM config changes done via sysfs are lost if the device power > > state is changed, e.g., typically set to D3hot in .suspend() and > > D0 in .resume(). > > > > 2) If L1SS is restored during system resume, that restored state > > would be overwritten. > > > > Problem 2) relates to a patch that is currently reverted (a7152be79b62 > > ("Revert "PCI/ASPM: Save L1 PM Substates Capability for > > suspend/resume""), so I don't think reverting 08d0cc5f3426 will make > > this problem worse. > > > > Reverting 08d0cc5f3426 will make 1) a problem again. But my guess is > > ASPM changes via sysfs are fairly unusual and the device probably > > remains functional even though it may use more power because the ASPM > > configuration was lost. > > > > So unless somebody has a counter-argument, I plan to queue a revert of > > 08d0cc5f3426 ("PCI/ASPM: Remove pcie_aspm_pm_state_change()") for > > v6.7. > > > > Bjorn > > If it helps I could also try if a partial revert of 08d0cc5f3426 would > be sufficient. This might also narrow down the issue and give more > insight where the issue originates from. > > Let me know what you think. Just wondering, does `echo 0 > /sys/power/pm_asysnc` help? Kai-Heng > > Michael