On Fri, Sep 22, 2023 at 07:59:26AM -0500, Bjorn Helgaas wrote: > [+cc Thorsten] > > On Fri, Sep 22, 2023 at 07:42:37AM +0300, Mika Westerberg wrote: > > On Thu, Sep 21, 2023 at 03:19:45PM -0500, Bjorn Helgaas wrote: > > > On Mon, Sep 18, 2023 at 08:30:41AM +0300, Mika Westerberg wrote: > > ... > > > > Kamil also bisected a 60+ second resume delay to e8b908146d44 > > > (https://lore.kernel.org/r/CA+cBOTeWrsTyANjLZQ=bGoBQ_yOkkV1juyRvJq-C8GOrbW6t9Q@xxxxxxxxxxxxxx), > > > but IIUC at > > > https://lore.kernel.org/linux-pci/20230824114300.GU3465@xxxxxxxxxxxxxxxxxx/T/#u > > > you concluded that Kamil's issue was related to firmware and actually > > > had nothing to do with e8b908146d44. > > > > > > Do you still think Kamil's issue is unrelated to e8b908146d44 and this > > > patch? If so, how do we handle Kamil's issue? An answer like "users > > > of v6.4+ must upgrade their Thunderbolt firmware" seems like it would > > > be kind of a nightmare for users. > > > > It's a different issue. What happens in his system is that the link went > > down even though the dock was still connected and this should not happen > > (the firmware should bring the link up during resume). The delay was > > just a "symptom". > > Do you have any leads for Kamil's issue? If we had known that > e8b908146d44 would cause that problem, we never would have applied it > in the first place. I explained it in the other email I just sent. I should mention here that the two issues are different. > No OS would accept that resume delay, so there must be some way to fix > that in the OS without requiring a firmware update. It is not "resume" delay. It is the delay what we wait for the device to become ready until we decide it is not functional/disconnect. That delay is completely arbitrary. > If Kamil's issue is that firmware doesn't bring up the link during > resume, how *does* the link get brought up, and what does the delay > have to do with it? The PCIe tunnel (the "link" above) gets established after D3cold by the Thunderbolt firmware running inside the host controller. The trigger is typically when _PR0 ACPI method is called, this sends special command through the mailbox that makes the firmware re-connect all the tunnels that were previously connected. The delay we are talking about here is the PCIe spec required delay after the device went through a reset that the OS must observe before it can send configuration requests to that device. Now, the PCIe spec does not specify how long the OS should wait for device on a link that does not come up. We increased that delay to the ~60s to fix another issue on a xHCI controller but forgot the fact that when the device is deliberately unplugged we still wait for the ~60s which is wasted effort and just ends up annoying users.