Re: [REGRESSION] usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2024-10-03 15:09, Mario Limonciello wrote:
> On 10/3/2024 13:51, Harry Wentland wrote:
>>
>>
>> On 2024-10-03 14:23, Harry Wentland wrote:
>>>
>>>
>>> On 2024-10-03 09:47, Mika Westerberg wrote:
>>>> On Thu, Oct 03, 2024 at 08:42:21AM -0500, Mario Limonciello wrote:
>>>>> On 10/3/2024 08:27, Mika Westerberg wrote:
>>>>>> On Thu, Oct 03, 2024 at 08:10:11AM -0500, Mario Limonciello wrote:
>>>>>>> On 10/3/2024 00:47, Mika Westerberg wrote:
>>>>>>>> Hi Harry,
>>>>>>>>
>>>>>>>> On Wed, Oct 02, 2024 at 01:42:29PM -0400, Harry Wentland wrote:
>>>>>>>>> I was checking out the 6.12 rc1 (through drm-next) kernel and found
>>>>>>>>> my system hung at boot. No meaningful message showed on the kernel
>>>>>>>>> boot screen.
>>>>>>>>>
>>>>>>>>> A bisect revealed the culprit to be
>>>>>>>>>
>>>>>>>>> commit f1bfb4a6fed64de1771b43a76631942279851744 (HEAD)
>>>>>>>>> Author: Mathias Nyman <mathias.nyman@xxxxxxxxxxxxxxx>
>>>>>>>>> Date:   Fri Aug 30 18:26:29 2024 +0300
>>>>>>>>>
>>>>>>>>>        usb: acpi: add device link between tunneled USB3 device and USB4 Host Interface
>>>>>>>>>
>>>>>>>>> A revert of this single patch "fixes" the issue and I can boot again.
>>>>>>>>> The system in question is a Thinkpad T14 with a Ryzen 7 PRO 6850U CPU.
>>>>>>>>> It's running Arch Linux but I doubt that's of consequence.
>>>>>>>>>
>>>>>>>>> lspci output:
>>>>>>>>>        https://gist.github.com/hwentland/59aef63d9b742b7b64d2604aae9792e0
>>>>>>>>> acpidump:
>>>>>>>>>        https://gist.github.com/hwentland/4824afc8d712c3d600be5c291f7f1089
>>>>>>>>>
>>>>>>>>> Mario suggested I try modprobe.blacklist=xhci-hcd but that did nothing.
>>>>>>>>> Another suggestion to do usbcore.nousb lets me boot to the desktop
>>>>>>>>> on a kernel with the faulty patch, without USB functionality, obviously.
>>>>>>>>>
>>>>>>>>> I'd be happy to try any patches, provide more data, or run experiments.
>>>>>>>>
>>>>>>>> Do you boot with any device connected?
>>>
>>> Great question. A Thinkpad USB-C dock. When I unplug the dock at boot it
>>> boots fine and when I plug it in later the laptop charges from it and the
>>> dock's audio output work fine.
>>>
>>> In the midst of my experiments I also noticed at one point the dock
>>> wasn't charging my laptop and hard-resetting the laptop didn't fix that.
>>> I had to unplug the dock from the wall and plug it back. So there is
>>> likely some interaction going on with this particular dock that must've
>>> sent the dock's FW into a bad state.
>>>
>>> The dmesg with the revert and thunderbolt.dyndbg=+p is here
>>> https://gist.github.com/hwentland/7e25dedd3e707fdae1185d65224d4d66
>>>
>>
>> Apologies, that dmesg was from a build with a bad .config and has some
>> FW loading errors. They seem to be unrelated though. This is a dmesg
>> from a good build. It still has a wlan FW error but that shouldn't have
>> anything to do with the problem at hand.
>>
>> https://gist.github.com/hwentland/867f7afbf3df20547a877e794a8d8e6b
>>
>>> I don't see any PCIe tunneling option in my BIOS.
>>>
>>>>>>>>> Second thing that I noticed, though I'm not familiar with AMD hardware,
>>>>>>>> but from your lspci dump, I do not see the PCIe ports that are being
>>>>>>>> used to tunnel PCIe. Does this system have PCIe tunneling disabled
>>>>>>>> somehow?
>>>>>>>
>>>>>>> On some OEM systems it's possible to lock down from BIOS to turn off PCIe
>>>>>>> tunneling, and I agree that looks like the most common cause.
>>>>>>>
>>>>>>> This is what you would see on a system that has tunnels (I checked on my
>>>>>>> side w/ Z series laptop w/ Rembrandt and a dock connected):
>>>>>>>
>>>>>>>              +-03.0
>>>>>>>              +-03.1-[03-32]--
>>>>>>>              +-04.0
>>>>>>>              +-04.1-[33-62]----00.0-[34-62]--+-02.0-[35]----00.0
>>>>>>>              |                               \-04.0-[36-62]--
>>>>>>>
>>>>>>> 00:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
>>>>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
>>>>>>> 00:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
>>>>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd]
>>>>>>> 00:04.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Family
>>>>>>> 17h-19h PCIe Dummy Host Bridge [1022:14b7] (rev 01)
>>>>>>> 00:04.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Family 19h
>>>>>>> USB4/Thunderbolt PCIe tunnel [1022:14cd]
>>>>>>
>>>>>> Okay this is more like what I expected, although probably not the
>>>>>> reason here.
>>>>>>
>>>>>> Are you able to replicate the issue if you disable PCIe tunneling from
>>>>>> the BIOS on your reference system? (Probably not but just in case).
>>>>>
>>>>> I checked on the Lenovo Z13 laptop I have and turned off "USB port" in BIOS
>>>>> setup and this caused the endpoints 3.1 and 4.1 I listed above to disappear
>>>>> but the system still boots up just fine for me on 6.12-rc1.
>>>>
>>>> Okay thanks for checking!
>>>>
>>>>>>>> You don't see anything on the console? It's all blank or it just hangs
>>>>>>>> after some messages?
>>>>>>>
>>>
>>> It hangs after some messages.
>>>
>>>>>>> I guess it is getting stuck on fwnode_find_reference() because it never
>>>>>>> finds the given node?
>>>>>>
>>>>>> Looking at the code, I don't see where it could get stuck. If for some
>>>>>> reason there is no such reference (there is based on the ACPI dump) then
>>>>>> it should not affect the boot. It only matters when power management is
>>>>>> involved.
>>>>>
>>>>> Nothing jumps out to me either.  Maybe this is a situation that Harry can
>>>>> sprinkle a bunch of printk's all over usb_acpi_add_usb4_devlink() to
>>>>> enlighten what's going on (assuming the console output is "working" when
>>>>> this happened).
>>>>
>>
>> I sprinkled printks but don't see any on the console.
>>
> 
> You said it can work properly without the revert if you don't boot with the dock plugged in?
> 

It can work properly without the revert if I boot without the dock plugged in.

> How about if you unplug it, does unhang and you get everything flushed to the console?
> 

Nothing happens.

> Or maybe magic sysrq with a backtrace (l) can help see where something is spinning.

Nothing happens. CONFIG_MAGIC_SYSRQ is enabled in my kernel.

Harry

> 
>> Harry
>>
>>>> There are couple of places there that may cause it to crash, I think.
>>>> And the __free() magic is something I cannot wrap my head around :(
>>>>
>>>> Anyways, Harry can you try the below patch and see if it makes any
>>>> difference? Also if it does please provide dmesg.
>>>>
>>>
>>> The patch doesn't seem to make a difference. Same hang on boot.
>>>
>>> Harry
>>>
>>>> diff --git a/drivers/usb/core/usb-acpi.c b/drivers/usb/core/usb-acpi.c
>>>> index 21585ed89ef8..90360f7ca905 100644
>>>> --- a/drivers/usb/core/usb-acpi.c
>>>> +++ b/drivers/usb/core/usb-acpi.c
>>>> @@ -157,6 +157,7 @@ EXPORT_SYMBOL_GPL(usb_acpi_set_power_state);
>>>>    */
>>>>   static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
>>>>   {
>>>> +    struct fwnode_handle *nhi_fwnode;
>>>>       const struct device_link *link;
>>>>       struct usb_port *port_dev;
>>>>       struct usb_hub *hub;
>>>> @@ -165,11 +166,12 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
>>>>           return 0;
>>>>         hub = usb_hub_to_struct_hub(udev->parent);
>>>> -    port_dev = hub->ports[udev->portnum - 1];
>>>> +    if (WARN_ON(!hub))
>>>> +        return 0;
>>>>   -    struct fwnode_handle *nhi_fwnode __free(fwnode_handle) =
>>>> -        fwnode_find_reference(dev_fwnode(&port_dev->dev), "usb4-host-interface", 0);
>>>> +    port_dev = hub->ports[udev->portnum - 1];
>>>>   +    nhi_fwnode = fwnode_find_reference(dev_fwnode(&port_dev->dev), "usb4-host-interface", 0);
>>>>       if (IS_ERR(nhi_fwnode))
>>>>           return 0;
>>>>   @@ -180,12 +182,14 @@ static int usb_acpi_add_usb4_devlink(struct usb_device *udev)
>>>>       if (!link) {
>>>>           dev_err(&port_dev->dev, "Failed to created device link from %s to %s\n",
>>>>               dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev));
>>>> +        fwnode_handle_put(nhi_fwnode);
>>>>           return -EINVAL;
>>>>       }
>>>>   -    dev_dbg(&port_dev->dev, "Created device link from %s to %s\n",
>>>> -        dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev));
>>>> +    dev_info(&port_dev->dev, "Created device link from %s to %s\n",
>>>> +         dev_name(&port_dev->child->dev), dev_name(nhi_fwnode->dev));
>>>>   +    fwnode_handle_put(nhi_fwnode);
>>>>       return 0;
>>>>   }
>>>>  
>>>
>>
> 





[Index of Archives]     [Linux Media]     [Linux Input]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Old Linux USB Devel Archive]

  Powered by Linux