= Summary = A Thinkpad T480s laptop with a Thinkpad Thunderbolt 3 Dock connected can no longer resume from suspend. The problem was introduced in e8b908146d44 "PCI/PM: Increase wait time after resume". = Detailed description = When running a kernel containing the identified commit and trying to resume the laptop from sleep, the laptop's power light changes from blinking (sleep state) to shining (running state), but the display stays black and it doesn't respond to any keyboard input, nor to ping/ssh, and no logs are written to the disk (which means I don't know how to gather error logs). It needs to be force-rebooted. I bisected the kernel and identified the commit which causes this behavior. I used the vanilla kernel with a Fedora kernel config. The reproducer is: 1. Connect the dock to the laptop. 2. Boot the laptop (in my case, to the gdm). 3. Suspend the laptop. 4. Resume the laptop. This is successful before the identified commit (the last tested good commit was cc8a983d0fce), and unsuccessful (black screen, frozen system) after the identified commit (e8b908146d44). The reproducibility is 100%, I tested it many many times in a row. When the dock is unplugged, suspend and resume works as expected. When I connect a different laptop to the dock (Thinkpad P1 gen3), I don't see any resume failure. So this is somehow related to the particular combination of Thinkpad T480s and Thinkpad Thunderbolt 3 Dock. The dock is running the latest firmware. I also tested "pcie_aspm=off", and that allows the laptop to resume properly, but then there's a race condition whether devices on the dock are visible to the OS or not after resume, so this is not useful even just as a workaround. I already created a downstream Fedora bug report in Red Hat Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=2230357 lspci of the laptop: https://bugzilla-attachments.redhat.com/attachment.cgi?id=1982541 git bisect log: https://bugzilla-attachments.redhat.com/attachment.cgi?id=1983351 The commit which broke resume is the following: e8b908146d44310473e43b3382eca126e12d279c is the first bad commit commit e8b908146d44310473e43b3382eca126e12d279c Author: Mika Westerberg <mika.westerberg.com> Date: Tue Apr 4 08:27:13 2023 +0300 PCI/PM: Increase wait time after resume PCIe r6.0 sec 6.6.1 prescribes that a device must be able to respond to config requests within 1.0 s (PCI_RESET_WAIT) after exiting conventional reset and this same delay is prescribed when coming out of D3cold (as that involves reset too). A device that requires more than 1 second to initialize after reset may respond to config requests with Request Retry Status completions (sec 2.3.1), and we accommodate that in Linux with a 60 second cap (PCIE_RESET_READY_POLL_MS). Previously we waited up to PCIE_RESET_READY_POLL_MS only in the reset code path, not in the resume path. However, a device has surfaced, namely Intel Titan Ridge xHCI, which requires a longer delay also in the resume code path. Make the resume code path to use this same extended delay as the reset path. Link: https://bugzilla.kernel.org/show_bug.cgi?id=216728 Link: https://lore.kernel.org/r/20230404052714.51315-2-mika.westerberg@xxxxxxxxxxxxxxx Reported-by: Chris Chiu <chris.chiu> Signed-off-by: Mika Westerberg <mika.westerberg.com> Signed-off-by: Bjorn Helgaas <bhelgaas> Cc: Lukas Wunner <lukas> drivers/pci/pci-driver.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) I'm happy to add further details, perform additional debugging, or test some experimental patches in order to resolve this regression. Please CC me in your replies, I'm not subscribed to this list. Thank you! Kamil Páral #regzbot introduced: e8b908146d44