Re: diagnosing resume failures after disconnected USB4 drives (Was: Re: PCI/ASPM: Fix L1SS saving (linus/master commit 7507eb3e7bfac))

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Thu, Feb 27, 2025 at 09:46:07AM -0800, Kenneth Crudup wrote:
> So I think, the failure mode may be related in some part to DP/Tunneling,
> too- I finally got another lockup (this time, after a hibernate, which I
> guess is some of the same facility) but what was different about this time
> where I couldn't reproduce the lockups (and what happens when I use my
> CalDigit dock) was I had an external USB-C monitor connected when I resumed,
> and when I'm home (where I sometimes forget to remove the NVMe USB4 adaptor)
> I always have my monitor connected to the dock.

It would be good to stick with a "proven" use-case so that the steps are
always the same. This may involve several issues in various parts of the
kernel and we need to track them one by one. If you change the steps in the
middle then we may end up finding completely different issues and it is not
helping the debugging effort.

The steps at the moment would be simply this:

1. Boot the system up, nothing connected.
2. Connect Thunderbolt dock and make sure UI authorizes it.
3. Connect Thunderbolt NVMe to the Thunderbolt dock and make sure UI authorizes it.
4. Verify that the devices behind PCIe tunnels are visible and functional (lspci for example)
5. Suspend the laptop by closing lid.
6. Unplug the dock (and the NVMe).
7. Resume the laptop by opening the lid.

Expectation: The system resumes just fine, finds the devices gone and stays functional.
Actual result: The system does not resume properly, seems to crash and burn the screen
	       is black.

Please correct me if I got something wrong. This is essentially that you go
from work to home, unplugging the dock and then resuming it at home.

The other thing is that in the pstore I see these:

thunderbolt 0000:00:0d.2: 0:5: __tb_path_deactivate_hop(): 401

but there is no such log in the mainline. If you have done some local
changes I suggest to drop all them to make sure we are looking at the same
source code.

> See attached dump log. I'm using the (somewhat still experimental) Xe
> display driver, but I've seen this same lockup happen with i915.

Please also keep using tha same graphics driver.

> In any case, I've now reverted 9d573d19, and when I get back to my CalDigit
> I can try instrumenting the code paths in the commit and see exactly where
> we're locking up.

No need to add any changes. Just try with the revert and see if that at
least makes the system resume properly. If it does then there could be
other issues but then you can take full dmesg and send to us instead of
those pstore snippets.




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux