Re: [REGRESSION] resume with a Thunderbolt dock broke with commit e8b908146d44 "PCI/PM: Increase wait time after resume"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 21, 2023 at 9:20 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> Wow, this is super interesting.  e8b908146d44 literally just increases
> a timeout; the complete patch is:
>
>    static void pci_pm_bridge_power_up_actions(struct pci_dev *pci_dev)
>    {
>   -       pci_bridge_wait_for_secondary_bus(pci_dev, "resume", PCI_RESET_WAIT);
>   +       pci_bridge_wait_for_secondary_bus(pci_dev, "resume",
>   +                                         PCIE_RESET_READY_POLL_MS);
>
> Increasing a timeout should never cause a failure like this, so
> there must be something really unexpected going on.

Hello Bjorn, thanks for a quick response.

Your reply helped me discover that the laptop doesn't really *fail* to
resume, it just makes the resume much *longer*. I just never waited
that long. PCI_RESET_WAIT is 1 second, PCIE_RESET_READY_POLL_MS is 60
seconds. If I wait long enough, the laptop finally resumes correctly
after roughly 70 seconds (before the patch the resume took roughly 5
seconds). Sorry for not spotting that earlier!

I also tested this with the current git master tip (commit
f7757129e3de). Without any adjustments, the resume delay is roughly 70
seconds. But if I change PCIE_RESET_READY_POLL_MS from 60 seconds to 2
seconds and recompile it, the resume delay is roughly 6 seconds.

With the latest kernel f7757129e3de, here are some debugging logs:
* dmesg collected after delayed resume (extra 60 seconds):
  https://bugzilla-attachments.redhat.com/attachment.cgi?id=1984636
* system journal after delayed resume:
  https://bugzilla-attachments.redhat.com/attachment.cgi?id=1984637
* lspci -vv before suspend:
  https://bugzilla-attachments.redhat.com/attachment.cgi?id=1984638
* lspci -vv after delayed resume:
  https://bugzilla-attachments.redhat.com/attachment.cgi?id=1984639


> Would you mind
> collecting the output of "sudo lspci -vv" both with and without
> "pcie_aspm=off"?  No need to try suspend/resume to collect these.
>
> Also, what does this race condition look like?  Dock devices are
> visible before suspend, but sometimes none of them are visible *after*
> resume?  We don't re-enumerate on resume, so does this mean they still
> appear in lspci output but they just don't work?

I didn't manage to debug this today. Given the newly discovered
circumstances described above, I wonder whether your request still
applies. If it does, I can provide it tomorrow.

Thanks for looking into this,
Kamil Páral





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux