On 2024-10-10 08:01, Mathias Nyman wrote: > On 10.10.2024 5.23, Mario Limonciello wrote: >> On 10/9/2024 16:52, Mathias Nyman wrote: >>> On 3.10.2024 16.47, Mika Westerberg wrote: >>>> On Thu, Oct 03, 2024 at 08:42:21AM -0500, Mario Limonciello wrote: >>>>> On 10/3/2024 08:27, Mika Westerberg wrote: >>>>>> On Thu, Oct 03, 2024 at 08:10:11AM -0500, Mario Limonciello wrote: >>>>>>> On 10/3/2024 00:47, Mika Westerberg wrote: >>>>>>>> Hi Harry, >>>>>>>> >>>>>>>> On Wed, Oct 02, 2024 at 01:42:29PM -0400, Harry Wentland wrote: >>>>>>>>> I was checking out the 6.12 rc1 (through drm-next) kernel and found >>>>>>>>> my system hung at boot. No meaningful message showed on the kernel >>>>>>>>> boot screen. >>>>>>>>> >>>>>>>>> A bisect revealed the culprit to be >>>>>>>>> >>>>>>>>> commit f1bfb4a6fed64de1771b43a76631942279851744 (HEAD) >>>>>>>>> Author: Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx> >>>>>>>>> Date: Fri Aug 30 18:26:29 2024 +0300 >>>>>>>>> >>>>>>>>> usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface >>>>>>>>> >>>>>>>>> A revert of this single patch "fixes" the issue and I can boot again. >>>>>>>>> The system in question is a Thinkpad T14 with a Ryzen 7 PRO 6850U CPU. >>>>>>>>> It's running Arch Linux but I doubt that's of consequence. >>>>>>>>> >>>>>>>>> lspci output: >>>>>>>>> https://gist.github.com/ hwentland/59aef63d9b742b7b64d2604aae9792e0 >>>>>>>>> acpidump: >>>>>>>>> https://gist.github.com/ hwentland/4824afc8d712c3d600be5c291f7f1089 >>>>>>>>> >>>>>>>>> Mario suggested I try modprobe.blacklist=xhci-hcd but that did nothing. >>>>>>>>> Another suggestion to do usbcore.nousb lets me boot to the desktop >>>>>>>>> on a kernel with the faulty patch, without USB functionality, obviously. >>>>>>>>> >>>>>>>>> I'd be happy to try any patches, provide more data, or run experiments. >>>>>>>> >>>>>>>> Do you boot with any device connected? >>>>>>>>> Second thing that I noticed, though I'm not familiar with AMD hardware, >>>>>>>> but from your lspci dump, I do not see the PCIe ports that are being >>>>>>>> used to tunnel PCIe. Does this system have PCIe tunneling disabled >>>>>>>> somehow? >>>>>>> >>>>>>> On some OEM systems it's possible to lock down from BIOS to turn off PCIe >>>>>>> tunneling, and I agree that looks like the most common cause. >>>>>>> >>>>>>> This is what you would see on a system that has tunnels (I checked on my >>>>>>> side w/ Z series laptop w/ Rembrandt and a dock connected): >>>>>>> >>>>>>> +-03.0 >>>>>>> +-03.1-[03-32]-- >>>>>>> +-04.0 >>>>>>> +-04.1-[33-62]----00.0-[34-62]--+-02.0-[35]----00.0 >>>>>>> | \-04.0-[36-62]-- >>>>>>> >>>>>>> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family >>>>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01) >>>>>>> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h >>>>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd] >>>>>>> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family >>>>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01) >>>>>>> 00:04.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h >>>>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd] >>>>>> >>>>>> Okay this is more like what I expected, although probably not the >>>>>> reason here. >>>>>> >>>>>> Are you able to replicate the issue if you disable PCIe tunneling from >>>>>> the BIOS on your reference system? (Probably not but just in case). >>>>> >>>>> I checked on the Lenovo Z13 laptop I have and turned off "USB port" in BIOS >>>>> setup and this caused the endpoints 3.1 and 4.1 I listed above to disappear >>>>> but the system still boots up just fine for me on 6.12-rc1. >>>> >>>> Okay thanks for checking! >>>> >>>>>>>> You don't see anything on the console? It's all blank or it just hangs >>>>>>>> after some messages? >>>>>>> >>>>>>> I guess it is getting stuck on fwnode_find_reference() because it never >>>>>>> finds the given node? >>>>>> >>>>>> Looking at the code, I don't see where it could get stuck. If for some >>>>>> reason there is no such reference (there is based on the ACPI dump) then >>>>>> it should not affect the boot. It only matters when power management is >>>>>> involved. >>>>> >>>>> Nothing jumps out to me either. Maybe this is a situation that Harry can >>>>> sprinkle a bunch of printk's all over usb_acpi_add_usb4_devlink() to >>>>> enlighten what's going on (assuming the console output is "working" when >>>>> this happened). >>>> >>>> There are couple of places there that may cause it to crash, I think. >>> >>> Its possible we end up trying to create a device link during usb3 device >>> "consumer" enumeration before the "supplier" NHI device is properly bound to a driver. >>> >>> This is something driver-api/device_link.rst states can cause issues. >>> >>> This could happen if xhci isn't capable of detecting tunneled devices, >>> but ACPI tables contain all info needed to assume device might be tunneled. >>> i.e. udev->tunnel_mode == USB_LINK_UNKNOWN. >>> >>> Harry, could you test if the code below helps? >>> >>> diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c >>> index 21585ed89ef8..94c335a7b933 100644 >>> --- a/drivers/usb/core/usb-acpi.c >>> +++ b/drivers/usb/core/usb-acpi.c >>> @@ -173,6 +173,13 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev) >>> if (IS_ERR(nhi_fwnode)) >>> return 0; >>> >>> + if (!nhi_fwnode->dev || !device_is_bound(nhi_fwnode->dev)) { >>> + dev_info(&port_dev->dev, "%s not tunneled as it probed before USB4 Host Interface\n", >> >> I'm aware this message is mostly to prove whether this is the actual issue but I do want to say if this patch indeed helps Harry's problem and you keep a message in what goes upstream I don't think this is accurate for all cases. >> >> If you have a Pre-OS CM, it might build tunnels and those could be active until the USB4 CM loads and resets them (by the default behavior). >> >> So I think a more accurate message would just be "%s probed before USB4 host interface". > > Makes sense, I'll tune the message in the final patch if this works > Apologies for the late response. I was traveling last week. This patch does the trick, i.e., no more hangs on boot when connected to the Lenovo USB dock. Harry > Thanks > Mathias >