On 2024-10-03 14:23, Harry Wentland wrote: > > > On 2024-10-03 09:47, Mika Westerberg wrote: >> On Thu, Oct 03, 2024 at 08:42:21AM -0500, Mario Limonciello wrote: >>> On 10/3/2024 08:27, Mika Westerberg wrote: >>>> On Thu, Oct 03, 2024 at 08:10:11AM -0500, Mario Limonciello wrote: >>>>> On 10/3/2024 00:47, Mika Westerberg wrote: >>>>>> Hi Harry, >>>>>> >>>>>> On Wed, Oct 02, 2024 at 01:42:29PM -0400, Harry Wentland wrote: >>>>>>> I was checking out the 6.12 rc1 (through drm-next) kernel and found >>>>>>> my system hung at boot. No meaningful message showed on the kernel >>>>>>> boot screen. >>>>>>> >>>>>>> A bisect revealed the culprit to be >>>>>>> >>>>>>> commit f1bfb4a6fed64de1771b43a76631942279851744 (HEAD) >>>>>>> Author: Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> >>>>>>> Date: Fri Aug 30 18:26:29 2024 +0300 >>>>>>> >>>>>>> usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface >>>>>>> >>>>>>> A revert of this single patch "fixes" the issue and I can boot again. >>>>>>> The system in question is a Thinkpad T14 with a Ryzen 7 PRO 6850U CPU. >>>>>>> It's running Arch Linux but I doubt that's of consequence. >>>>>>> >>>>>>> lspci output: >>>>>>> https://gist.github.com/hwentland/59aef63d9b742b7b64d2604aae9792e0 >>>>>>> acpidump: >>>>>>> https://gist.github.com/hwentland/4824afc8d712c3d600be5c291f7f1089 >>>>>>> >>>>>>> Mario suggested I try modprobe.blacklist=xhci-hcd but that did nothing. >>>>>>> Another suggestion to do usbcore.nousb lets me boot to the desktop >>>>>>> on a kernel with the faulty patch, without USB functionality, obviously. >>>>>>> >>>>>>> I'd be happy to try any patches, provide more data, or run experiments. >>>>>> >>>>>> Do you boot with any device connected? > > Great question. A Thinkpad USB-C dock. When I unplug the dock at boot it > boots fine and when I plug it in later the laptop charges from it and the > dock's audio output work fine. > > In the midst of my experiments I also noticed at one point the dock > wasn't charging my laptop and hard-resetting the laptop didn't fix that. > I had to unplug the dock from the wall and plug it back. So there is > likely some interaction going on with this particular dock that must've > sent the dock's FW into a bad state. > > The dmesg with the revert and thunderbolt.dyndbg=+p is here > https://gist.github.com/hwentland/7e25dedd3e707fdae1185d65224d4d66 > Apologies, that dmesg was from a build with a bad .config and has some FW loading errors. They seem to be unrelated though. This is a dmesg from a good build. It still has a wlan FW error but that shouldn't have anything to do with the problem at hand. https://gist.github.com/hwentland/867f7afbf3df20547a877e794a8d8e6b > I don't see any PCIe tunneling option in my BIOS. > >>>>>>> Second thing that I noticed, though I'm not familiar with AMD hardware, >>>>>> but from your lspci dump, I do not see the PCIe ports that are being >>>>>> used to tunnel PCIe. Does this system have PCIe tunneling disabled >>>>>> somehow? >>>>> >>>>> On some OEM systems it's possible to lock down from BIOS to turn off PCIe >>>>> tunneling, and I agree that looks like the most common cause. >>>>> >>>>> This is what you would see on a system that has tunnels (I checked on my >>>>> side w/ Z series laptop w/ Rembrandt and a dock connected): >>>>> >>>>> +-03.0 >>>>> +-03.1-[03-32]-- >>>>> +-04.0 >>>>> +-04.1-[33-62]----00.0-[34-62]--+-02.0-[35]----00.0 >>>>> | \-04.0-[36-62]-- >>>>> >>>>> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family >>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01) >>>>> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h >>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd] >>>>> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family >>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01) >>>>> 00:04.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h >>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd] >>>> >>>> Okay this is more like what I expected, although probably not the >>>> reason here. >>>> >>>> Are you able to replicate the issue if you disable PCIe tunneling from >>>> the BIOS on your reference system? (Probably not but just in case). >>> >>> I checked on the Lenovo Z13 laptop I have and turned off "USB port" in BIOS >>> setup and this caused the endpoints 3.1 and 4.1 I listed above to disappear >>> but the system still boots up just fine for me on 6.12-rc1. >> >> Okay thanks for checking! >> >>>>>> You don't see anything on the console? It's all blank or it just hangs >>>>>> after some messages? >>>>> > > It hangs after some messages. > >>>>> I guess it is getting stuck on fwnode_find_reference() because it never >>>>> finds the given node? >>>> >>>> Looking at the code, I don't see where it could get stuck. If for some >>>> reason there is no such reference (there is based on the ACPI dump) then >>>> it should not affect the boot. It only matters when power management is >>>> involved. >>> >>> Nothing jumps out to me either. Maybe this is a situation that Harry can >>> sprinkle a bunch of printk's all over usb_acpi_add_usb4_devlink() to >>> enlighten what's going on (assuming the console output is "working" when >>> this happened). >> I sprinkled printks but don't see any on the console. Harry >> There are couple of places there that may cause it to crash, I think. >> And the __free() magic is something I cannot wrap my head around :( >> >> Anyways, Harry can you try the below patch and see if it makes any >> difference? Also if it does please provide dmesg. >> > > The patch doesn't seem to make a difference. Same hang on boot. > > Harry > >> diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c >> index 21585ed89ef8..90360f7ca905 100644 >> --- a/drivers/usb/core/usb-acpi.c >> +++ b/drivers/usb/core/usb-acpi.c >> @@ -157,6 +157,7 @@ EXPORT_SYMBOL_GPL(usb_acpi_set_power_state); >> */ >> static int usb_acpi_add_usb4_devlink(struct usb_device *udev) >> { >> + struct fwnode_handle *nhi_fwnode; >> const struct device_link *link; >> struct usb_port *port_dev; >> struct usb_hub *hub; >> @@ -165,11 +166,12 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev) >> return 0; >> >> hub = usb_hub_to_struct_hub(udev->parent); >> - port_dev = hub->ports[udev->portnum - 1]; >> + if (WARN_ON(!hub)) >> + return 0; >> >> - struct fwnode_handle *nhi_fwnode __free(fwnode_handle) = >> - fwnode_find_reference(dev_fwnode(&port_dev->dev), "usb4-host-interface", 0); >> + port_dev = hub->ports[udev->portnum - 1]; >> >> + nhi_fwnode = fwnode_find_reference(dev_fwnode(&port_dev->dev), "usb4-host-interface", 0); >> if (IS_ERR(nhi_fwnode)) >> return 0; >> >> @@ -180,12 +182,14 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev) >> if (!link) { >> dev_err(&port_dev->dev, "Failed to created device link from %s to %s\n", >> dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev)); >> + fwnode_handle_put(nhi_fwnode); >> return -EINVAL; >> } >> >> - dev_dbg(&port_dev->dev, "Created device link from %s to %s\n", >> - dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev)); >> + dev_info(&port_dev->dev, "Created device link from %s to %s\n", >> + dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev)); >> >> + fwnode_handle_put(nhi_fwnode); >> return 0; >> } >> >> >