" On Thu, Sep 8, 2022 at 12:30 AM Limonciello, Mario <Mario.Limonciello@xxxxxxx> wrote: > > [Public] > > Hi, > > > -----Original Message----- > > From: Greg KH <gregkh@xxxxxxxxxxxxxxxxxxx> > > Sent: Monday, September 5, 2022 02:30 > > To: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx> > > Cc: mika.westerberg@xxxxxxxxxxxxxxx; andreas.noever@xxxxxxxxx; > > michael.jamet@xxxxxxxxx; YehezkelShB@xxxxxxxxx; Mehta, Sanju > > <Sanju.Mehta@xxxxxxx>; Limonciello, Mario > > <Mario.Limonciello@xxxxxxx>; linux-usb@xxxxxxxxxxxxxxx; linux- > > kernel@xxxxxxxxxxxxxxx > > Subject: Re: [PATCH] thunderbolt: Resume PCIe bridges after switch is found > > on AMD USB4 controller > > > > On Mon, Sep 05, 2022 at 02:56:22PM +0800, Kai-Heng Feng wrote: > > > AMD USB4 can not detect external PCIe devices like external NVMe when > > > it's hotplugged, because card/link are not up: > > > > > > pcieport 0000:00:04.1: pciehp: pciehp_check_link_active: lnk_status = 1101 > > > > That sounds like a hardware bug, how does this work in other operating > > systems for this hardware? > > We happen to have this HP system in our lab. My colleague Anson (now on CC) flashed > the same BIOS to it (01.02.01) using dediprog and loaded a 6.0-rc3 mainline kernel built > from the Canonical mainline kernel PPA. > > He then tried to hotplug a TBT3 SSD a number of times but couldn't hit this issue. > I attached his log to the kernel Bugzilla. Nice to hear. Hopefully this can be fixed at firmware/hardware side. > > > > > > Use `lspci` to resume pciehp bridges can find external devices. > > > > That's not good :( > > > > > A long delay before checking card/link presence doesn't help, either. > > > The only way to make the hotplug work is to enable pciehp interrupt and > > > check card presence after the TB switch is added. > > > > > > Since the topology of USB4 and its PCIe bridges are siblings, hardcode > > > the bridge ID so TBT driver can wake them up to check presence. > > > > As I mention below, this is not an acceptable solution. > > > > AMD developers, any ideas on how to get this fixed in the TB controller > > firware instead? > > Anson also double checked on the AMD reference hardware that the HP system is built > against and couldn't reproduce it there either. > > KH, I've got a few questions/comments to try to better explain why we're here. > > 1) How did you flash the 01.02.01 firmware? In Anson's check, he used dediprog. > Is it possible there was some stateful stuff used by HP's BIOS still on the SPI from the > upgrade that didn't get set/cleared properly from an earlier pre-release BIOS? We used UEFI capsule to update the firmware, via fwupd. > > 2) Did you change any BIOS settings? Particularly anything to do with Pre-OS CM? No, nothing in BIOS was changed. > > 3) If you explicitly reset to HP's "default BIOS settings" does it resolve? Doesn't help. I put the device to ACPI G3 and it doesn't help, either. > > 4) Can you double check ADP_CS_5 bit 31? I attached is a patch to kernel Bugzilla to > add dyndbg output for it. If it was for some reason set by Pre-OS CM in your BIOS/settings > combination, we might need to undo it by the Linux CM. All ports say "Hotplug disabled: 0". dmesg attached to the bugzilla. > > 5) Are you changing any of the default runtime PM policies for any of the USB4 routers or > root ports used for tunneling using software like TLP? No. And they should be suspended by default. Kai-Heng > > > > > > > > > Bugzilla: > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz > > illa.kernel.org%2Fshow_bug.cgi%3Fid%3D216448&data=05%7C01%7Cm > > ario.limonciello%40amd.com%7C1e27b1d6f69e42796c7b08da8f107121%7C3d > > d8961fe4884e608e11a82d994e183d%7C0%7C0%7C637979598042186185%7CU > > nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI > > 6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=0lhcaKfUyoK > > 0FXT9uDZ8a%2Fpxs9tHd8aoQcyPFdB%2F0eY%3D&reserved=0 > > > Signed-off-by: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx> > > > --- > > > drivers/thunderbolt/nhi.c | 29 +++++++++++++++++++++++++++++ > > > drivers/thunderbolt/switch.c | 6 ++++++ > > > drivers/thunderbolt/tb.c | 1 + > > > drivers/thunderbolt/tb.h | 5 +++++ > > > include/linux/thunderbolt.h | 1 + > > > 5 files changed, 42 insertions(+) > > > > > > diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c > > > index cb8c9c4ae93a2..75f5ce5e22978 100644 > > > --- a/drivers/thunderbolt/nhi.c > > > +++ b/drivers/thunderbolt/nhi.c > > > @@ -1225,6 +1225,8 @@ static int nhi_probe(struct pci_dev *pdev, const > > struct pci_device_id *id) > > > { > > > struct tb_nhi *nhi; > > > struct tb *tb; > > > + struct pci_dev *p = NULL; > > > + struct tb_pci_bridge *pci_bridge, *n; > > > int res; > > > > > > if (!nhi_imr_valid(pdev)) { > > > @@ -1306,6 +1308,19 @@ static int nhi_probe(struct pci_dev *pdev, const > > struct pci_device_id *id) > > > nhi_shutdown(nhi); > > > return res; > > > } > > > + > > > + if (pdev->vendor == PCI_VENDOR_ID_AMD) { > > > + while ((p = pci_get_device(PCI_VENDOR_ID_AMD, 0x14cd, > > p))) { > > > + pci_bridge = kmalloc(sizeof(struct tb_pci_bridge), > > GFP_KERNEL); > > > + if (!pci_bridge) > > > + goto cleanup; > > > + > > > + pci_bridge->bridge = p; > > > + INIT_LIST_HEAD(&pci_bridge->list); > > > + list_add(&pci_bridge->list, &tb->bridge_list); > > > + } > > > + } > > > > You can't walk the device tree and create a "shadow" list of devices > > like this and expect any lifetime rules to work properly with them at > > all. > > > > Please do not do this. > > > > greg k-h