On Mon, Mar 03, 2025 at 04:33:08AM -0800, Kenneth Crudup wrote: > > OK, I may not be explaining the history properly, so more background: > > (I tend to run Linus' master that I pull every few days, partially 'cause I > like to see all the new fixes and features, and partially 'cause over the > years I'll stumble over bugs and help the subsystems' Maintainer(s) fix the > problems.) > > Anyway, late last year I'd notice lately (it wasn't happening before) that > once I'd get to the office, my laptop would be hard-hung on resume, which I > eventually traced back to having my NVMe adaptor connected to my TB Dock > when I suspended/hibernated. I'd started to try to bisect it, but couldn't > find a good starting point (or one too far back) and would have to give up > 'cause I'd run out of time. However, I'd mention the issue in the mailing > lists, hoping for a solution- and that's when you'd discovered 9d573d19. > > But between your NVMe discovery (and by this time I was mostly :( careful > about disconnecting the NVMe adaptor before suspend) and sometime around the > beginning of the year I was also getting occasional hard-hangs on resume > even if I hadn't had the NVMe adaptor connected on suspend. I'd seen where > the pstore dumps were pointing to the display driver, so I'd switched back > to the i915 from the xe driver, but that hadn't fixed it either. In the > meantime, having seen one of the OOPses be in __tb_path_deactivate_hop(), > I'd dropped some printks (actually "tb_port_info()", I think) at various > points printing the line# so I could try and tell approximately where the > crash occurred (yeah, I know I need to get my ksymoops up and running :) ). > I hadn't made the correlation yet between having an external monitor > connected or not, and having seen a number of xe/i915/dp/Thunderbolt changes > come thru, was both hoping for the fix to be reported and corrected, or try > and find time and find out why it was happening via my tracing. > > So in late February we'd had two failure modes for me in Linus' master: > - 9d573d19 (NVMe adaptor connected on suspend causing an OOPS on resume) > - d6d458d4 (OOPS if external USB-C DP monitor connected on resume) > > I couldn't/didn't recognize the 2nd issue fully until you'd discovered the > cause of the first one. > > At home I have a Samsung Odyssey monitor connected to a USB-C-to-DP 2.1 > cable, to a TB port on a CalDigit TB4 dock. > > My travel bag has a generic Chinese USB-C DP tunneling portable monitor > which is usually connected to a Plugable TB hub. > > In any case, the resume failures happen with either one. Okay thanks for elaborating that. > On 3/3/25 03:53, Mika Westerberg wrote: > > > I thought the system resumes fine after you reverted the other commit > > (9d573d19), no? Just you don't get display tunneled so for example if you > > login over ethernet (ssh) you should still be able to get full dmesg. > > Nah, it usually hard hangs if a monitor is connected when I resume; has to > be power-cycled at that point. > > > We can actually take PCIe out of the equation so that you ask "boltctl" to > > forget the device temporarily (or from the GNOME settings "privacy and > > security" -> "Thunderbolt" then "forget device" for each). This means your > > docks do not work fully but display should and then we hopefully can get > > the dmesg. > > Well my topology is almost always Laptop -> Dock -> Monitor . Okay. > This workflow came about ironically enough 'cause my client has given me a > MS Surface (Windows) machine with only one TB/USB-C port, and since I will > physically switch to using my own machine, to minimize setup changes I just > use the "one cable for all" approach (i.e., never connecting the external > monitor to the other TB port on my XPS-9320). > > Oh and the failure mode for d6d458d4 is ALWAYS this, and always(?) from line > 436/7 of ".../drivers/thunderbolt/path.c", a call to tb_port_write() : That's also weird because we don't do anything for DP tunnels on resume so what this code is doing is to clean up for the tunnels left by the boot kernel (since this is hibernate). The code added by d6d458d4 is not run yet, only later on when we get hotplugs from the connected device DP OUT adapter. I will see if I can reproduce this on my setup, next.