[REGRESSION] resume with a Thunderbolt dock broke with commit e8b908146d44 "PCI/PM: Increase wait time after resume"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



= Summary =
A Thinkpad T480s laptop with a Thinkpad Thunderbolt 3 Dock connected
can no longer resume from suspend. The problem was introduced in
e8b908146d44 "PCI/PM: Increase wait time after resume".

= Detailed description =
When running a kernel containing the identified commit and trying to
resume the laptop from sleep, the laptop's power light changes from
blinking (sleep state) to shining (running state), but the display
stays black and it doesn't respond to any keyboard input, nor to
ping/ssh, and no logs are written to the disk (which means I don't
know how to gather error logs). It needs to be force-rebooted. I
bisected the kernel and identified the commit which causes this
behavior. I used the vanilla kernel with a Fedora kernel config.

The reproducer is:
1. Connect the dock to the laptop.
2. Boot the laptop (in my case, to the gdm).
3. Suspend the laptop.
4. Resume the laptop. This is successful before the identified commit
(the last tested good commit was cc8a983d0fce), and unsuccessful
(black screen, frozen system) after the identified commit
(e8b908146d44).

The reproducibility is 100%, I tested it many many times in a row.

When the dock is unplugged, suspend and resume works as expected. When
I connect a different laptop to the dock (Thinkpad P1 gen3), I don't
see any resume failure. So this is somehow related to the particular
combination of Thinkpad T480s and Thinkpad Thunderbolt 3 Dock. The
dock is running the latest firmware. I also tested "pcie_aspm=off",
and that allows the laptop to resume properly, but then there's a race
condition whether devices on the dock are visible to the OS or not
after resume, so this is not useful even just as a workaround.


I already created a downstream Fedora bug report in Red Hat Bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=2230357

lspci of the laptop:
https://bugzilla-attachments.redhat.com/attachment.cgi?id=1982541
git bisect log:
https://bugzilla-attachments.redhat.com/attachment.cgi?id=1983351


The commit which broke resume is the following:

e8b908146d44310473e43b3382eca126e12d279c is the first bad commit
commit e8b908146d44310473e43b3382eca126e12d279c
Author: Mika Westerberg <mika.westerberg.com>
Date:   Tue Apr 4 08:27:13 2023 +0300

    PCI/PM: Increase wait time after resume

    PCIe r6.0 sec 6.6.1 prescribes that a device must be able to respond to
    config requests within 1.0 s (PCI_RESET_WAIT) after exiting conventional
    reset and this same delay is prescribed when coming out of D3cold (as that
    involves reset too).

    A device that requires more than 1 second to initialize after reset may
    respond to config requests with Request Retry Status completions (sec
    2.3.1), and we accommodate that in Linux with a 60 second cap
    (PCIE_RESET_READY_POLL_MS).

    Previously we waited up to PCIE_RESET_READY_POLL_MS only in the reset code
    path, not in the resume path.  However, a device has surfaced, namely Intel
    Titan Ridge xHCI, which requires a longer delay also in the resume code
    path.

    Make the resume code path to use this same extended delay as the reset
    path.

    Link: https://bugzilla.kernel.org/show_bug.cgi?id=216728
    Link: https://lore.kernel.org/r/20230404052714.51315-2-mika.westerberg@xxxxxxxxxxxxxxx
    Reported-by: Chris Chiu <chris.chiu>
    Signed-off-by: Mika Westerberg <mika.westerberg.com>
    Signed-off-by: Bjorn Helgaas <bhelgaas>
    Cc: Lukas Wunner <lukas>

 drivers/pci/pci-driver.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)


I'm happy to add further details, perform additional debugging, or
test some experimental patches in order to resolve this regression.
Please CC me in your replies, I'm not subscribed to this list. Thank
you!
Kamil Páral


#regzbot introduced: e8b908146d44





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux