On Fri, 11 Oct 2024 19:38:11 +0300 Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx> wrote: > Hi, > > On Thu, Oct 10, 2024 at 11:26:56PM -0500, Aaron Rainbolt wrote: > > > Can you share full dmesg with the repro and > > > "thunderbolt.dyndbg=+p" in the kernel command line? > > > > The full log is very long, so I've included it as an email > > attachment. The exact steps taken after booting with the requested > > kernel parameter were: > > > > 1. boot with thunderbolt.dyndbg=+p kernel param, no USB-C plugged > > in. 2. After login, hot-plug two USB-C cables. This time, the > > displays came up and stayed resident (this happens sometimes) > > 3. Unplugged both cables. > > 4. Replugged both. This time, the displays did not show anything. > > 5. lspci -k "jiggled" the displays and they came back on. > > 6. After ~15s, the displays blacked out again. > > 7. Save to the demsg file after about 30s. > > > > The laptop's firmware is fully up-to-date. One of the fixes we tried > > was installing Windows 11, updating the firmware, and then > > re-installing Kubuntu 24.04. This had no effect on the issue. > > > > Notes: > > > > * Kernel 6.1 does not exhibit this time out. 6.5 and later do. > > * Windows 11 had very similar behavior before installing Windows > > updates. After update, it was fixed. > > * All distros and W11 were tested on the same hardware with the > > latest firmware, so we know this is not a hardware failure. > > Thanks for the logs and steps! > > I now realize that > > a75e0684efe5 ("thunderbolt: Keep the domain powered when USB4 port > is in redrive mode") > > was half-baked. Yes it deals with the situation where plugging in > monitor when the domain is powered. However, it completely misses > these cases: > > * Plug in monitor to the Type-C port when the controller is runtime > suspended. > * Boot with monitor plugged in to the Type-C port. > > At the end of this email there is a hack patch that tries to solve > this. Can you try it out? I will be on vacation next week but I'm > copying my colleague Gil who is familiar with this too. He should be > able to help you out during my absense. Thank you so much! We have applied the patch to our kernel and are recompiling now. I'll report back the results when testing is complete. > Couple of notes about the dmesg you shared. They don't affect this > issue but may cause other issues: > > > [ 1.382718] thunderbolt 0000:06:00.0: device links to tunneled > > native ports are missing! > > This is means the BIOS does not implement the USB4 power contract > which means that USB 3.x and PCIe tunnels will not work as expected > after power transition. Good to know, thank you. > > [ 1.416488] thunderbolt 0000:06:00.0: 0: NVM version 14.86 > > This is really old firmware version. My development system for example > has 56.x so yours might have a bunch of issues that are solved in the > later versions. Ah, ok. The machine used for reproducing the issue for generating the dmesg log was *not* the same machine as the one we did our initial testing on, though it was the exact same model. We used Windows to fully update the firmware on the machine where we did the bulk of our testing, and the exact same symptoms were observed with the latest firmware. We'll try to update the firmware on any machines we use for future log gathering. Thanks again for your help with this! > The hack patch below: > > diff --git a/drivers/thunderbolt/tb.c b/drivers/thunderbolt/tb.c > index 07a66594e904..0e424b7661be 100644 > --- a/drivers/thunderbolt/tb.c > +++ b/drivers/thunderbolt/tb.c > @@ -2113,6 +2113,37 @@ static void tb_exit_redrive(struct tb_port > *port) } > } > > +static void tb_switch_enter_redrive(struct tb_switch *sw) > +{ > + struct tb_port *port; > + > + tb_switch_for_each_port(sw, port) > + tb_enter_redrive(port); > +} > + > +/* > + * Called during system and runtime suspend to forcefully exit > redrive > + * mode without querying whether the resource is available. > + */ > +static void tb_switch_exit_redrive(struct tb_switch *sw) > +{ > + struct tb_port *port; > + > + if (!(sw->quirks & QUIRK_KEEP_POWER_IN_DP_REDRIVE)) > + return; > + > + tb_switch_for_each_port(sw, port) { > + if (!tb_port_is_dpin(port)) > + continue; > + > + if (port->redrive) { > + port->redrive = false; > + pm_runtime_put(&sw->dev); > + tb_port_dbg(port, "exit redrive mode\n"); > + } > + } > +} > + > static void tb_dp_resource_unavailable(struct tb *tb, struct tb_port > *port, const char *reason) > { > @@ -2987,6 +3018,7 @@ static int tb_start(struct tb *tb, bool reset) > tb_create_usb3_tunnels(tb->root_switch); > /* Add DP IN resources for the root switch */ > tb_add_dp_resources(tb->root_switch); > + tb_switch_enter_redrive(tb->root_switch); > /* Make the discovered switches available to the userspace */ > device_for_each_child(&tb->root_switch->dev, NULL, > tb_scan_finalize_switch); > @@ -3002,6 +3034,7 @@ static int tb_suspend_noirq(struct tb *tb) > > tb_dbg(tb, "suspending...\n"); > tb_disconnect_and_release_dp(tb); > + tb_switch_exit_redrive(tb->root_switch); > tb_switch_suspend(tb->root_switch, false); > tcm->hotplug_active = false; /* signal tb_handle_hotplug to > quit */ tb_dbg(tb, "suspend finished\n"); > @@ -3094,6 +3127,7 @@ static int tb_resume_noirq(struct tb *tb) > tb_dbg(tb, "tunnels restarted, sleeping for > 100ms\n"); msleep(100); > } > + tb_switch_enter_redrive(tb->root_switch); > /* Allow tb_handle_hotplug to progress events */ > tcm->hotplug_active = true; > tb_dbg(tb, "resume finished\n"); > @@ -3157,6 +3191,8 @@ static int tb_runtime_suspend(struct tb *tb) > struct tb_cm *tcm = tb_priv(tb); > > mutex_lock(&tb->lock); > + tb_disconnect_and_release_dp(tb); > + tb_switch_exit_redrive(tb->root_switch); > tb_switch_suspend(tb->root_switch, true); > tcm->hotplug_active = false; > mutex_unlock(&tb->lock); > @@ -3188,6 +3224,7 @@ static int tb_runtime_resume(struct tb *tb) > tb_restore_children(tb->root_switch); > list_for_each_entry_safe(tunnel, n, &tcm->tunnel_list, list) > tb_tunnel_activate(tunnel); > + tb_switch_enter_redrive(tb->root_switch); > tcm->hotplug_active = true; > mutex_unlock(&tb->lock); >