Re: diagnosing resume failures after disconnected USB4 drives (Was: Re: PCI/ASPM: Fix L1SS saving (linus/master commit 7507eb3e7bfac))

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Trying to do a "control" test before I try out your bisected commit, and
Lukas' changes, but of course now I can't get it to fail (I'm on Linus'
master as of this morning (b5799106b4).

I'm using my portable USB4 dock (Plugable TBT4-HUB3C) this time (vs. my
CalDigit 4 dock) but the same ASMedia USB4-to-NVMe adapter as always; in
any case everything is PCIe so it shouldn't matter.

I don't normally use "tbauth" (I think that's all done for me via the
"boltctl" suite) but I grabbed and built the GIT and ran it anyway, for
good measure.

I'll keep you updated, I'll be at my CalDigit dock soon enough if I
can't get any failures this morning.

-K


On 2/26/25 00:44, Mika Westerberg wrote:
Hi Kenneth,

On Fri, Feb 14, 2025 at 09:39:33AM -0800, Kenneth Crudup wrote:

This is excellent news that you were able to reproduce it- I'd figured this
regression would have been caught already (as I do remember this working
before) and was worried it may have been specific to a particular piece of
hardware (or software setup) on my system.

I'll see what I can dig up on my end, but as I'm not expert in these
subsystems I may not be able to diagnose anything until your return.

[Back now]

My git bisect ended up to this commit:

   9d573d19547b ("PCI: pciehp: Detect device replacement during system sleep")

Adding Lukas who is the expert.

My steps to reproduce on Intel Meteor Lake based reference system are:

1. Boot the system up, nothing connected.
2. Once up, connect Thunderbolt 4 dock and Thunderbolt 3 NVMe in a chain:

   [Meteor Lake host] <--> [TB 4 dock] <--> [TB 3 NVMe]

3. Authorize PCIe tunnels (whatever your distro provides, my buildroot just
     has the debugging tools so running 'tbauth -r 301')

4. Check that the PCIe topology matches the expected (lspci)

5. Enter s2idle:

   # rtcwake -s 30 -mmem

6. Once it is suspended, unplug the cable between the host and the dock.

7. Wait for the resume to happen.

Expectation: The system wakes up fine, notices that the TB and PCIe devices
are gone, stays responsive and usable.

Actual result: Resume never completes.

I added "no_console_suspend" to the command line and the did sysrq-w to
get list of blocked tasks. I've attached it just in case it is needed.

If I revert the above commit the issue is gone. Now I'm not sure if this is
exactly the same issue that you are seeing but nevertheless this is kind of
normal use case so definitely something we should get fixed.

Lukas, if you need any more information let me know. I can reproduce this
easily.

I also saw some DRM/connected fixes posted to Linus' master so maybe one of
them corrects this new display-crash issue (I'm not home on my big monitor
to be able to test yet).

-Kenny

On 2/14/25 08:29, Mika Westerberg wrote:
Hi,

On Thu, Feb 13, 2025 at 11:19:35AM -0800, Kenneth Crudup wrote:

On 2/13/25 05:59, Mika Westerberg wrote:

Hi,

As Murphy's would have it, now my crashes are display-driver related (this
is Xe, but I've also seen it with i915).

Attached here just for the heck of it, but I'll be better testing the NVMe
enclosure-related failures this weekend. Stay tuned!

Okay, I checked quickly and no TB related crash there but I was actually
able to reproduce hang when I unplug the device chain during suspend. I did
not yet have time to look into it deeper. I'm sure this has been working
fine in the past as we tested all kinds of topologies including similar to
this.

I will be out next week for vacation but will continue after that if the
problem is not alraedy solved ;-)


--
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange County
CA

--
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange County CA





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux