On 2024-10-03 09:47, Mika Westerberg wrote: > On Thu, Oct 03, 2024 at 08:42:21AM -0500, Mario Limonciello wrote: >> On 10/3/2024 08:27, Mika Westerberg wrote: >>> On Thu, Oct 03, 2024 at 08:10:11AM -0500, Mario Limonciello wrote: >>>> On 10/3/2024 00:47, Mika Westerberg wrote: >>>>> Hi Harry, >>>>> >>>>> On Wed, Oct 02, 2024 at 01:42:29PM -0400, Harry Wentland wrote: >>>>>> I was checking out the 6.12 rc1 (through drm-next) kernel and found >>>>>> my system hung at boot. No meaningful message showed on the kernel >>>>>> boot screen. >>>>>> >>>>>> A bisect revealed the culprit to be >>>>>> >>>>>> commit f1bfb4a6fed64de1771b43a76631942279851744 (HEAD) >>>>>> Author: Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> >>>>>> Date: Fri Aug 30 18:26:29 2024 +0300 >>>>>> >>>>>> usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface >>>>>> >>>>>> A revert of this single patch "fixes" the issue and I can boot again. >>>>>> The system in question is a Thinkpad T14 with a Ryzen 7 PRO 6850U CPU. >>>>>> It's running Arch Linux but I doubt that's of consequence. >>>>>> >>>>>> lspci output: >>>>>> https://gist.github.com/hwentland/59aef63d9b742b7b64d2604aae9792e0 >>>>>> acpidump: >>>>>> https://gist.github.com/hwentland/4824afc8d712c3d600be5c291f7f1089 >>>>>> >>>>>> Mario suggested I try modprobe.blacklist=xhci-hcd but that did nothing. >>>>>> Another suggestion to do usbcore.nousb lets me boot to the desktop >>>>>> on a kernel with the faulty patch, without USB functionality, obviously. >>>>>> >>>>>> I'd be happy to try any patches, provide more data, or run experiments. >>>>> >>>>> Do you boot with any device connected? Great question. A Thinkpad USB-C dock. When I unplug the dock at boot it boots fine and when I plug it in later the laptop charges from it and the dock's audio output work fine. In the midst of my experiments I also noticed at one point the dock wasn't charging my laptop and hard-resetting the laptop didn't fix that. I had to unplug the dock from the wall and plug it back. So there is likely some interaction going on with this particular dock that must've sent the dock's FW into a bad state. The dmesg with the revert and thunderbolt.dyndbg=+p is here https://gist.github.com/hwentland/7e25dedd3e707fdae1185d65224d4d66 I don't see any PCIe tunneling option in my BIOS. >>>>>> Second thing that I noticed, though I'm not familiar with AMD hardware, >>>>> but from your lspci dump, I do not see the PCIe ports that are being >>>>> used to tunnel PCIe. Does this system have PCIe tunneling disabled >>>>> somehow? >>>> >>>> On some OEM systems it's possible to lock down from BIOS to turn off PCIe >>>> tunneling, and I agree that looks like the most common cause. >>>> >>>> This is what you would see on a system that has tunnels (I checked on my >>>> side w/ Z series laptop w/ Rembrandt and a dock connected): >>>> >>>> +-03.0 >>>> +-03.1-[03-32]-- >>>> +-04.0 >>>> +-04.1-[33-62]----00.0-[34-62]--+-02.0-[35]----00.0 >>>> | \-04.0-[36-62]-- >>>> >>>> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family >>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01) >>>> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h >>>> USB4/Thunderbolt PCIe tunnel [1022:14cd] >>>> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family >>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01) >>>> 00:04.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h >>>> USB4/Thunderbolt PCIe tunnel [1022:14cd] >>> >>> Okay this is more like what I expected, although probably not the >>> reason here. >>> >>> Are you able to replicate the issue if you disable PCIe tunneling from >>> the BIOS on your reference system? (Probably not but just in case). >> >> I checked on the Lenovo Z13 laptop I have and turned off "USB port" in BIOS >> setup and this caused the endpoints 3.1 and 4.1 I listed above to disappear >> but the system still boots up just fine for me on 6.12-rc1. > > Okay thanks for checking! > >>>>> You don't see anything on the console? It's all blank or it just hangs >>>>> after some messages? >>>> It hangs after some messages. >>>> I guess it is getting stuck on fwnode_find_reference() because it never >>>> finds the given node? >>> >>> Looking at the code, I don't see where it could get stuck. If for some >>> reason there is no such reference (there is based on the ACPI dump) then >>> it should not affect the boot. It only matters when power management is >>> involved. >> >> Nothing jumps out to me either. Maybe this is a situation that Harry can >> sprinkle a bunch of printk's all over usb_acpi_add_usb4_devlink() to >> enlighten what's going on (assuming the console output is "working" when >> this happened). > > There are couple of places there that may cause it to crash, I think. > And the __free() magic is something I cannot wrap my head around :( > > Anyways, Harry can you try the below patch and see if it makes any > difference? Also if it does please provide dmesg. > The patch doesn't seem to make a difference. Same hang on boot. Harry > diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c > index 21585ed89ef8..90360f7ca905 100644 > --- a/drivers/usb/core/usb-acpi.c > +++ b/drivers/usb/core/usb-acpi.c > @@ -157,6 +157,7 @@ EXPORT_SYMBOL_GPL(usb_acpi_set_power_state); > */ > static int usb_acpi_add_usb4_devlink(struct usb_device *udev) > { > + struct fwnode_handle *nhi_fwnode; > const struct device_link *link; > struct usb_port *port_dev; > struct usb_hub *hub; > @@ -165,11 +166,12 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev) > return 0; > > hub = usb_hub_to_struct_hub(udev->parent); > - port_dev = hub->ports[udev->portnum - 1]; > + if (WARN_ON(!hub)) > + return 0; > > - struct fwnode_handle *nhi_fwnode __free(fwnode_handle) = > - fwnode_find_reference(dev_fwnode(&port_dev->dev), "usb4-host-interface", 0); > + port_dev = hub->ports[udev->portnum - 1]; > > + nhi_fwnode = fwnode_find_reference(dev_fwnode(&port_dev->dev), "usb4-host-interface", 0); > if (IS_ERR(nhi_fwnode)) > return 0; > > @@ -180,12 +182,14 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev) > if (!link) { > dev_err(&port_dev->dev, "Failed to created device link from %s to %s\n", > dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev)); > + fwnode_handle_put(nhi_fwnode); > return -EINVAL; > } > > - dev_dbg(&port_dev->dev, "Created device link from %s to %s\n", > - dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev)); > + dev_info(&port_dev->dev, "Created device link from %s to %s\n", > + dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev)); > > + fwnode_handle_put(nhi_fwnode); > return 0; > } > >