On 2024-10-03 15:09, Mario Limonciello wrote: > On 10/3/2024 13:51, Harry Wentland wrote: >> >> >> On 2024-10-03 14:23, Harry Wentland wrote: >>> >>> >>> On 2024-10-03 09:47, Mika Westerberg wrote: >>>> On Thu, Oct 03, 2024 at 08:42:21AM -0500, Mario Limonciello wrote: >>>>> On 10/3/2024 08:27, Mika Westerberg wrote: >>>>>> On Thu, Oct 03, 2024 at 08:10:11AM -0500, Mario Limonciello wrote: >>>>>>> On 10/3/2024 00:47, Mika Westerberg wrote: >>>>>>>> Hi Harry, >>>>>>>> >>>>>>>> On Wed, Oct 02, 2024 at 01:42:29PM -0400, Harry Wentland wrote: >>>>>>>>> I was checking out the 6.12 rc1 (through drm-next) kernel and found >>>>>>>>> my system hung at boot. No meaningful message showed on the kernel >>>>>>>>> boot screen. >>>>>>>>> >>>>>>>>> A bisect revealed the culprit to be >>>>>>>>> >>>>>>>>> commit f1bfb4a6fed64de1771b43a76631942279851744 (HEAD) >>>>>>>>> Author: Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> >>>>>>>>> Date: Fri Aug 30 18:26:29 2024 +0300 >>>>>>>>> >>>>>>>>> usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface >>>>>>>>> >>>>>>>>> A revert of this single patch "fixes" the issue and I can boot again. >>>>>>>>> The system in question is a Thinkpad T14 with a Ryzen 7 PRO 6850U CPU. >>>>>>>>> It's running Arch Linux but I doubt that's of consequence. >>>>>>>>> >>>>>>>>> lspci output: >>>>>>>>> https://gist.github.com/hwentland/59aef63d9b742b7b64d2604aae9792e0 >>>>>>>>> acpidump: >>>>>>>>> https://gist.github.com/hwentland/4824afc8d712c3d600be5c291f7f1089 >>>>>>>>> >>>>>>>>> Mario suggested I try modprobe.blacklist=xhci-hcd but that did nothing. >>>>>>>>> Another suggestion to do usbcore.nousb lets me boot to the desktop >>>>>>>>> on a kernel with the faulty patch, without USB functionality, obviously. >>>>>>>>> >>>>>>>>> I'd be happy to try any patches, provide more data, or run experiments. >>>>>>>> >>>>>>>> Do you boot with any device connected? >>> >>> Great question. A Thinkpad USB-C dock. When I unplug the dock at boot it >>> boots fine and when I plug it in later the laptop charges from it and the >>> dock's audio output work fine. >>> >>> In the midst of my experiments I also noticed at one point the dock >>> wasn't charging my laptop and hard-resetting the laptop didn't fix that. >>> I had to unplug the dock from the wall and plug it back. So there is >>> likely some interaction going on with this particular dock that must've >>> sent the dock's FW into a bad state. >>> >>> The dmesg with the revert and thunderbolt.dyndbg=+p is here >>> https://gist.github.com/hwentland/7e25dedd3e707fdae1185d65224d4d66 >>> >> >> Apologies, that dmesg was from a build with a bad .config and has some >> FW loading errors. They seem to be unrelated though. This is a dmesg >> from a good build. It still has a wlan FW error but that shouldn't have >> anything to do with the problem at hand. >> >> https://gist.github.com/hwentland/867f7afbf3df20547a877e794a8d8e6b >> >>> I don't see any PCIe tunneling option in my BIOS. >>> >>>>>>>>> Second thing that I noticed, though I'm not familiar with AMD hardware, >>>>>>>> but from your lspci dump, I do not see the PCIe ports that are being >>>>>>>> used to tunnel PCIe. Does this system have PCIe tunneling disabled >>>>>>>> somehow? >>>>>>> >>>>>>> On some OEM systems it's possible to lock down from BIOS to turn off PCIe >>>>>>> tunneling, and I agree that looks like the most common cause. >>>>>>> >>>>>>> This is what you would see on a system that has tunnels (I checked on my >>>>>>> side w/ Z series laptop w/ Rembrandt and a dock connected): >>>>>>> >>>>>>> +-03.0 >>>>>>> +-03.1-[03-32]-- >>>>>>> +-04.0 >>>>>>> +-04.1-[33-62]----00.0-[34-62]--+-02.0-[35]----00.0 >>>>>>> | \-04.0-[36-62]-- >>>>>>> >>>>>>> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family >>>>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01) >>>>>>> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h >>>>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd] >>>>>>> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family >>>>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01) >>>>>>> 00:04.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h >>>>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd] >>>>>> >>>>>> Okay this is more like what I expected, although probably not the >>>>>> reason here. >>>>>> >>>>>> Are you able to replicate the issue if you disable PCIe tunneling from >>>>>> the BIOS on your reference system? (Probably not but just in case). >>>>> >>>>> I checked on the Lenovo Z13 laptop I have and turned off "USB port" in BIOS >>>>> setup and this caused the endpoints 3.1 and 4.1 I listed above to disappear >>>>> but the system still boots up just fine for me on 6.12-rc1. >>>> >>>> Okay thanks for checking! >>>> >>>>>>>> You don't see anything on the console? It's all blank or it just hangs >>>>>>>> after some messages? >>>>>>> >>> >>> It hangs after some messages. >>> >>>>>>> I guess it is getting stuck on fwnode_find_reference() because it never >>>>>>> finds the given node? >>>>>> >>>>>> Looking at the code, I don't see where it could get stuck. If for some >>>>>> reason there is no such reference (there is based on the ACPI dump) then >>>>>> it should not affect the boot. It only matters when power management is >>>>>> involved. >>>>> >>>>> Nothing jumps out to me either. Maybe this is a situation that Harry can >>>>> sprinkle a bunch of printk's all over usb_acpi_add_usb4_devlink() to >>>>> enlighten what's going on (assuming the console output is "working" when >>>>> this happened). >>>> >> >> I sprinkled printks but don't see any on the console. >> > > You said it can work properly without the revert if you don't boot with the dock plugged in? > It can work properly without the revert if I boot without the dock plugged in. > How about if you unplug it, does unhang and you get everything flushed to the console? > Nothing happens. > Or maybe magic sysrq with a backtrace (l) can help see where something is spinning. Nothing happens. CONFIG_MAGIC_SYSRQ is enabled in my kernel. Harry > >> Harry >> >>>> There are couple of places there that may cause it to crash, I think. >>>> And the __free() magic is something I cannot wrap my head around :( >>>> >>>> Anyways, Harry can you try the below patch and see if it makes any >>>> difference? Also if it does please provide dmesg. >>>> >>> >>> The patch doesn't seem to make a difference. Same hang on boot. >>> >>> Harry >>> >>>> diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c >>>> index 21585ed89ef8..90360f7ca905 100644 >>>> --- a/drivers/usb/core/usb-acpi.c >>>> +++ b/drivers/usb/core/usb-acpi.c >>>> @@ -157,6 +157,7 @@ EXPORT_SYMBOL_GPL(usb_acpi_set_power_state); >>>> */ >>>> static int usb_acpi_add_usb4_devlink(struct usb_device *udev) >>>> { >>>> + struct fwnode_handle *nhi_fwnode; >>>> const struct device_link *link; >>>> struct usb_port *port_dev; >>>> struct usb_hub *hub; >>>> @@ -165,11 +166,12 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev) >>>> return 0; >>>> hub = usb_hub_to_struct_hub(udev->parent); >>>> - port_dev = hub->ports[udev->portnum - 1]; >>>> + if (WARN_ON(!hub)) >>>> + return 0; >>>> - struct fwnode_handle *nhi_fwnode __free(fwnode_handle) = >>>> - fwnode_find_reference(dev_fwnode(&port_dev->dev), "usb4-host-interface", 0); >>>> + port_dev = hub->ports[udev->portnum - 1]; >>>> + nhi_fwnode = fwnode_find_reference(dev_fwnode(&port_dev->dev), "usb4-host-interface", 0); >>>> if (IS_ERR(nhi_fwnode)) >>>> return 0; >>>> @@ -180,12 +182,14 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev) >>>> if (!link) { >>>> dev_err(&port_dev->dev, "Failed to created device link from %s to %s\n", >>>> dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev)); >>>> + fwnode_handle_put(nhi_fwnode); >>>> return -EINVAL; >>>> } >>>> - dev_dbg(&port_dev->dev, "Created device link from %s to %s\n", >>>> - dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev)); >>>> + dev_info(&port_dev->dev, "Created device link from %s to %s\n", >>>> + dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev)); >>>> + fwnode_handle_put(nhi_fwnode); >>>> return 0; >>>> } >>>> >>> >> >