OK, looks like the changes made to the (now-recently-released) 6.11 have
fixed all the suspend/resume issues.
... and it turns out that my crashes on the CalDigit TB4 dock are
probably related to a Thunderbolt-to-NVMe enclosure that was always
plugged into to the dock; apparently when resuming "something" was
waiting for the now-disconnected NVMe drive to come back, leading to the
hangs. Disconnecting the enclosure from the dock seems to prevent the
resume crashes.
I may try and root-cause that issue later, if I have time.
I guess we can call the Subject: issue mostly-solved.
-Kenny
On 9/13/24 14:59, Kenneth Crudup wrote:
Huh. This particular kernel is proving to be quite resilient, as in
"announce that it's been fixed, as that'll definitely make it break"
resilient.
I've done at least 5/6 suspend/resume cycles going between no dock,
USB-C/DP docks and now TB(USB4) docks and it's resumed properly every
time (and thanks to 9d573d195 even seems to recognize topology
changes too).
(My main USB4/TB dock is at home, A Caldigit 4 with a 7680x2160 DP
monitor on it; this tends to be the problematic dock for suspend/resumes
and provided calling these suspend/resume issues publically "fixed"
doesn't invoke Murphy's Law I'll know if I'd had continued success
tomorrow).
-K
On 9/12/24 23:11, Kenneth Crudup wrote:
Well, now get this- I'm back to running Linus' master (as of
79a61cc3fc0) and I've been trying to get resumes to fail and they
haven't (which means the next time I try after hitting "send" it's
going to fail spectacularly).
My SWAG is it may be related to commits 79a61cc3fc or 3e705251d998c9,
but I'll see if it breaks and if it doesn't, all the better :)
-K
On 9/12/24 22:25, Mika Westerberg wrote:
Hi,
On Thu, Sep 12, 2024 at 02:12:27PM -0700, Kenneth Crudup wrote:
I'll run the stuff you need, but now it looks like whatever is breaking
suspend/resume in Linus' master has been ported down from upstream into
6.10.10; I'm now getting the same panic()s as I did with master. I
just had
a failed resume and the crash dump (which happened on its own) looks
the
same as the one I'd posted here.
Is the crash you see something different from the hang? If you can catch
that with the backtrace and the register dump it should help.
Couple of additional steps to try:
- Unplug monitors from the dock and see if that makes it work (assuming
you have monitors connected).
- Disable PCIe tunneling and see if that makes it work. This results
that the PCIe devices on the dock are not functional but it can point
us to the direction. You can do this on regular distro (Ubuntu,
Fedora
etC) like:
$ boltctl config auth-mode disabled
Or got to "Settings" -> "Privacy & Security" -> "Thunderbolt" and
flip
off the "Direct Access" switch.
I may try and find some time to bisect the issue, but it'll take
some time.
Sure.
--
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange
County CA