Re: [REGRESSION] resume with a Thunderbolt dock broke with commit e8b908146d44 "PCI/PM: Increase wait time after resume"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Sep 26, 2023 at 12:55:30PM -0500, Bjorn Helgaas wrote:
> On Mon, Sep 25, 2023 at 04:19:30PM +0200, Lukas Wunner wrote:
> > On Mon, Sep 25, 2023 at 08:48:41AM -0500, Bjorn Helgaas wrote:
> > > Now pciehp thinks the slot is occupied and the link is up, so we
> > > re-enumerate the hierarchy.  Is this because thunderbolt did something
> > > to 06:00.0 that made the link from 05:01.0 come up?
> > 
> > PCIe TLPs are encapsulated into Thunderbolt packets and transmitted
> > alongside DisplayPort and other data over the same physical link.
> > 
> > For this to work, PCIe tunnels need to be set up between the Thunderbolt
> > host controller and attached devices.  Once a tunnel is established,
> > the PCIe link magically goes up and TLPs can be transmitted.
> > 
> > There are two ways to establish those tunnels:
> > 
> > 1/ By a firmware in the Thunderbolt host controller.
> >    (firmware or "internal" connection manager, drivers/thunderbolt/icm.c)
> > 
> > 2/ Natively by the kernel.
> >    (software connection manager)
> > 
> > I'm assuming that the laptop in question exclusively uses the firmware
> > connection manager, hence the kernel is reliant on that firmware to
> > establish tunnels and can't really do anything if it fails to do so.
> 
> Thanks for the background; that improves my meager understanding a
> lot.
> 
> Since this seems to be a firmware issue, it does sound like this
> laptop uses a firmware connection manager.  But there still seems to
> be some kernel connection because pre-e8b908146d44, the link came up
> in <5 seconds, and after the minor e8b908146d44 change, it takes >60
> seconds.

In both cases (with or without) the commit what happens is that after
resume is finished the firmware connection manager notices the
connection, announces it to the Thunderbolt driver that exposes it to
the userspace where boltd re-authorizes the device. This brings up the
PCIe tunnel again and things get working.

(What is expected to happen is that during the resume the firmware
 connection manager re-connects the PCIe tunnel.)

This took previously the ~5s before resume is complete so that the above
steps can happen where as after the commit it got delayed more up to the
arbitrary ~60s because we started to use that with the commit
e8b908146d44 (PCIE_RESET_READY_POLL_MS).

> I'm kind of at a loss here because I don't have a clear path forward.
> What I'm hearing is that the real fix is a firmware update or a BIOS
> setting change (Thunderbolt "user" instead of "secure" mode).

There are lots of firmares involved so, say if any of them are turned
from the default value the system may enter code paths that are not
fully validated unfortunately.

I would also try to change all the BIOS settings back to defaults, see
that it works (it is probably in "user" security level then), then
switch back to "secure" (only change this one option) and try if it now
works. It could be that some setting just did not get commited properly.

> That is problematic for users, who will think resume got broken and
> they don't know how to fix it.  It's problematic for me, because it
> *looks* like a PCI issue and a PCI change exposed it, so I'll have to
> deal with the reports.

I'm sorry about that. Trying best I can to remedy this.



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux