Re: [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3.10.2024 16.47, Mika Westerberg wrote:
On Thu, Oct 03, 2024 at 08:42:21AM -0500, Mario Limonciello wrote:
On 10/3/2024 08:27, Mika Westerberg wrote:
On Thu, Oct 03, 2024 at 08:10:11AM -0500, Mario Limonciello wrote:
On 10/3/2024 00:47, Mika Westerberg wrote:
Hi Harry,

On Wed, Oct 02, 2024 at 01:42:29PM -0400, Harry Wentland wrote:
I was checking out the 6.12 rc1 (through drm-next) kernel and found
my system hung at boot. No meaningful message showed on the kernel
boot screen.

A bisect revealed the culprit to be

commit f1bfb4a6fed64de1771b43a76631942279851744 (HEAD)
Author: Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx>
Date:   Fri Aug 30 18:26:29 2024 +0300

       usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface

A revert of this single patch "fixes" the issue and I can boot again.
The system in question is a Thinkpad T14 with a Ryzen 7 PRO 6850U CPU.
It's running Arch Linux but I doubt that's of consequence.

lspci output:
       https://gist.github.com/hwentland/59aef63d9b742b7b64d2604aae9792e0
acpidump:
       https://gist.github.com/hwentland/4824afc8d712c3d600be5c291f7f1089

Mario suggested I try modprobe.blacklist=xhci-hcd but that did nothing.
Another suggestion to do usbcore.nousb lets me boot to the desktop
on a kernel with the faulty patch, without USB functionality, obviously.

I'd be happy to try any patches, provide more data, or run experiments.

Do you boot with any device connected?
Second thing that I noticed, though I'm not familiar with AMD hardware,
but from your lspci dump, I do not see the PCIe ports that are being
used to tunnel PCIe. Does this system have PCIe tunneling disabled
somehow?

On some OEM systems it's possible to lock down from BIOS to turn off PCIe
tunneling, and I agree that looks like the most common cause.

This is what you would see on a system that has tunnels (I checked on my
side w/ Z series laptop w/ Rembrandt and a dock connected):

             +-03.0
             +-03.1-[03-32]--
             +-04.0
             +-04.1-[33-62]----00.0-[34-62]--+-02.0-[35]----00.0
             |                               \-04.0-[36-62]--

00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
USB4/Thunderbolt PCIe tunnel [1022:14cd]
00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
00:04.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
USB4/Thunderbolt PCIe tunnel [1022:14cd]

Okay this is more like what I expected, although probably not the
reason here.

Are you able to replicate the issue if you disable PCIe tunneling from
the BIOS on your reference system? (Probably not but just in case).

I checked on the Lenovo Z13 laptop I have and turned off "USB port" in BIOS
setup and this caused the endpoints 3.1 and 4.1 I listed above to disappear
but the system still boots up just fine for me on 6.12-rc1.

Okay thanks for checking!

You don't see anything on the console? It's all blank or it just hangs
after some messages?

I guess it is getting stuck on fwnode_find_reference() because it never
finds the given node?

Looking at the code, I don't see where it could get stuck. If for some
reason there is no such reference (there is based on the ACPI dump) then
it should not affect the boot. It only matters when power management is
involved.

Nothing jumps out to me either.  Maybe this is a situation that Harry can
sprinkle a bunch of printk's all over usb_acpi_add_usb4_devlink() to
enlighten what's going on (assuming the console output is "working" when
this happened).

There are couple of places there that may cause it to crash, I think.

Its possible we end up trying to create a device link during usb3 device
"consumer" enumeration before the "supplier" NHI device is properly bound to a driver.

This is something driver-api/device_link.rst states can cause issues.

This could happen if xhci isn't capable of detecting tunneled devices,
but ACPI tables contain all info needed to assume device might be tunneled.
i.e. udev->tunnel_mode == USB_LINK_UNKNOWN.

Harry, could you test if the code below helps?

diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c
index 21585ed89ef8..94c335a7b933 100644
--- a/drivers/usb/core/usb-acpi.c
+++ b/drivers/usb/core/usb-acpi.c
@@ -173,6 +173,13 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
        if (IS_ERR(nhi_fwnode))
                return 0;
+ if (!nhi_fwnode->dev || !device_is_bound(nhi_fwnode->dev)) {
+               dev_info(&port_dev->dev, "%s not tunneled as it probed before USB4 Host Interface\n",
+                        dev_name(&port_dev->child->dev));
+               udev->tunnel_mode = USB_LINK_NATIVE;
+               return 0;
+       }
+
        link = device_link_add(&port_dev->child->dev, nhi_fwnode->dev,
                               DL_FLAG_AUTOREMOVE_CONSUMER |
                               DL_FLAG_RPM_ACTIVE |








[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux