Thinking it may have been related to timeouts (my Samsung Odyssey
monitor can sometimes take 15 seconds to come out of sleep and start
displaying) I'd set thunderbolt.dprx_timeout=100000 to no avail.
-K
On 3/1/25 20:57, Kenneth Crudup wrote:
Remember all those "__tb_path_deactivate_hop" messages you'd seen in my
previous pstore dumps? It was 'cause when I didn't get crashes with my
NVMe adaptor (which you found was caused by 9d573d1954) I was getting
these whenever I had an external monitor (all USB-C DP tunneled):
----
<4>[21119.295762][T22907] thunderbolt 0000:00:0d.2: 0:5: path does not
end on a DP adapter, cleaning up
<4>[21119.297327][T22907] Oops: Oops: 0000 [#1] PREEMPT SMP
<4>[21119.297334][T22907] CPU: 4 UID: 0 PID: 22907 Comm: systemd-sleep
Tainted: G S U 6.14.0-rc4-kenny+ #1
<4>[21119.297342][T22907] Tainted: [S]=CPU_OUT_OF_SPEC, [U]=USER
<4>[21119.297344][T22907] Hardware name: Dell Inc. XPS 9320/0KNXGD, BIOS
2.18.1 12/24/2024
<4>[21119.297347][T22907] RIP: 0010:__tb_path_deactivate_hop+0x5a/0x332
<4>[21119.297359][T22907] Code: 75 d0 41 89 d6 48 89 fa 48 c7 c7 68 49
fe a9 e8 dc 83 f8 ff 49 8b 47 20 41 0f b6 4f 50 4
1 b9 91 01 00 00 49 c7 c0 70 93 ab a9 <8b> b0 00 03 00 00 8b 90 04 03 00
00 48 8b 80 30 03 00 00 81 e2 ff
<4>[21119.297363][T22907] RSP: 0000:ffffab7a1f7f37a8 EFLAGS: 00010246
<4>[21119.297368][T22907] RAX: 0000000000000000 RBX: 0000000000000001
RCX: 0000000000000000
<4>[21119.297371][T22907] RDX: 0000000000000000 RSI: 0000000000000001
RDI: ffff8c00af51b780
<4>[21119.297375][T22907] RBP: ffffab7a1f7f37e8 R08: ffffffffa9ab9370
R09: 0000000000000191
<4>[21119.297379][T22907] R10: ffffffffaad58d88 R11: 0000000000000003
R12: 0000000051c7dd20
<4>[21119.297382][T22907] R13: ffffab7a1f7f37b0 R14: 000000000000001a
R15: ffffab7a00801b00
<4>[21119.297387][T22907] FS: 00007f4822dde940(0000)
GS:ffff8c00af500000(0000) knlGS:0000000000000000
<4>[21119.297393][T22907] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[21119.297397][T22907] CR2: 0000000000000300 CR3: 0000000424911002
CR4: 0000000000770ef0
<4>[21119.297401][T22907] PKRU: 55555554
<4>[21119.297404][T22907] Call Trace:
<4>[21119.297407][T22907] <TASK>
<4>[21119.297413][T22907] ? show_regs.part.0+0x1d/0x20
<4>[21119.297425][T22907] ? __die+0x52/0x91
<4>[21119.297436][T22907] ? page_fault_oops+0x9a/0x220
<4>[21119.297444][T22907] ? up+0x2d/0x60
<4>[21119.297450][T22907] ? exc_page_fault+0x2fc/0x5c0
<4>[21119.297460][T22907] ? asm_exc_page_fault+0x27/0x30
<4>[21119.297469][T22907] ? __tb_path_deactivate_hop+0x5a/0x332
<4>[21119.297476][T22907] ? __tb_path_deactivate_hop+0x44/0x332
<4>[21119.297483][T22907] __tb_path_deactivate_hops.cold+0x2e/0xaa
<4>[21119.297490][T22907] tb_path_deactivate+0x1e/0x110
<4>[21119.297496][T22907] tb_tunnel_deactivate+0x65/0x120
----
So when I got home this afternoon I kept throwing more pr_info()
checkpoints all over, and found out this was the culprit (line 436/7 of
".../drivers/thunderbolt/path.c"
----
return tb_port_write(port, &hop, TB_CFG_HOPS, 2 * hop_index, 2);
----
So I wrapped tb_port_write() with pr_info looking for bogus values and
found none (as well as none in the above call to it).
Taking a look at the underlying actual call to tb_cfg_write(), didn't
turn up anything obvious, so on a whim I did a log on .../drivers/
thunderbolt and took a chance, reverted the Subject: commit and haven't
had a resume/hibernate crash since. (9d573d1954 is also reverted).
My typical topology is XPS-9320 -> TB Hub (I have a CalDigit TS4, a
Plugable TBT4-HUB3C, and a Belkin Thunderbolt 3 Dock Core, it happens on
all of them) and a either a USB-C DP portable monitor, or at home via a
USB-C-to-DisplayPort cable.
If there's any other information you need to help fix this, let me know.
-K
--
Kenneth R. Crudup / Sr. SW Engineer, Scott County Consulting, Orange
County CA