OK, just did a resume after suspended (for an hour, which somehow seems
to matter) while my CalDigit dock was attached with the ASMedia NVMe
adaptor at suspend, but both disconnected on resume, and I am indeed
locked up.
I can attached the "pstore" report if necessary.
Unfortunately I won't be able to get back to the CalDigit until Saturday
afternoon California time.
I'll be trying all the reverts/commits listed herein and at least check
for regressions in other cases, though.
-Kenny
On 2/26/25 00:44, Mika Westerberg wrote:
Hi Kenneth,
On Fri, Feb 14, 2025 at 09:39:33AM -0800, Kenneth Crudup wrote:
This is excellent news that you were able to reproduce it- I'd figured this
regression would have been caught already (as I do remember this working
before) and was worried it may have been specific to a particular piece of
hardware (or software setup) on my system.
I'll see what I can dig up on my end, but as I'm not expert in these
subsystems I may not be able to diagnose anything until your return.
[Back now]
My git bisect ended up to this commit:
9d573d19547b ("PCI: pciehp: Detect device replacement during system sleep")
Adding Lukas who is the expert.
My steps to reproduce on Intel Meteor Lake based reference system are:
1. Boot the system up, nothing connected.
2. Once up, connect Thunderbolt 4 dock and Thunderbolt 3 NVMe in a chain:
[Meteor Lake host] <--> [TB 4 dock] <--> [TB 3 NVMe]
3. Authorize PCIe tunnels (whatever your distro provides, my buildroot just
has the debugging tools so running 'tbauth -r 301')
4. Check that the PCIe topology matches the expected (lspci)
5. Enter s2idle:
# rtcwake -s 30 -mmem
6. Once it is suspended, unplug the cable between the host and the dock.
7. Wait for the resume to happen.
Expectation: The system wakes up fine, notices that the TB and PCIe devices
are gone, stays responsive and usable.
Actual result: Resume never completes.
I added "no_console_suspend" to the command line and the did sysrq-w to
get list of blocked tasks. I've attached it just in case it is needed.
If I revert the above commit the issue is gone. Now I'm not sure if this is
exactly the same issue that you are seeing but nevertheless this is kind of
normal use case so definitely something we should get fixed.
Lukas, if you need any more information let me know. I can reproduce this
easily.
I also saw some DRM/connected fixes posted to Linus' master so maybe one of
them corrects this new display-crash issue (I'm not home on my big monitor
to be able to test yet).
-Kenny
On 2/14/25 08:29, Mika Westerberg wrote:
Hi,
On Thu, Feb 13, 2025 at 11:19:35AM -0800, Kenneth Crudup wrote:
On 2/13/25 05:59, Mika Westerberg wrote:
Hi,
As Murphy's would have it, now my crashes are display-driver related (this
is Xe, but I've also seen it with i915).
Attached here just for the heck of it, but I'll be better testing the NVMe
enclosure-related failures this weekend. Stay tuned!
Okay, I checked quickly and no TB related crash there but I was actually
able to reproduce hang when I unplug the device chain during suspend. I did
not yet have time to look into it deeper. I'm sure this has been working
fine in the past as we tested all kinds of topologies including similar to
this.
I will be out next week for vacation but will continue after that if the
problem is not alraedy solved ;-)
--
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange County
CA
--
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange
County CA