Hi, On Thu, Sep 8, 2022 at 11:22 PM Limonciello, Mario <mario.limonciello@xxxxxxx> wrote: [snipped] > > Nice to hear. Hopefully this can be fixed at firmware/hardware side. > > I guess you and Anson might want to sync up offline and compare whether > you have the same hardware stepping. Sure. [snipped] > > We used UEFI capsule to update the firmware, via fwupd. > > So that's a difference from how Anson did it. Could you perhaps dump > the BIOS SPI image? Anson can flash the exact same dump via dediprog > and see if he can repro. > > It would let us confirm if it was caused by your upgrade path. OK, will dissuss this with AMD/HP offline. > > > > >> > >> 2) Did you change any BIOS settings? Particularly anything to do with Pre-OS CM? > > > > No, nothing in BIOS was changed. > > >> > >> 3) If you explicitly reset to HP's "default BIOS settings" does it resolve? > > > > Doesn't help. I put the device to ACPI G3 and it doesn't help, either. > > OK. > > > > >> > >> 4) Can you double check ADP_CS_5 bit 31? I attached is a patch to kernel Bugzilla to > >> add dyndbg output for it. If it was for some reason set by Pre-OS CM in your BIOS/settings > >> combination, we might need to undo it by the Linux CM. > > > > All ports say "Hotplug disabled: 0". > > > > dmesg attached to the bugzilla. > > OK, that at least rules out DHP from Pre-OS CM. > > > > >> > >> 5) Are you changing any of the default runtime PM policies for any of the USB4 routers or > >> root ports used for tunneling using software like TLP? > > > > No. And they should be suspended by default. > > > > Thinking about this being possibly a firmware upgrade path problem, can > you please check: > > # grep SMC /sys/kernel/debug/dri/0/amdgpu_firmware_info > > Anson's system was 0x04453200 (program 4, version 69.50.0). Exactly the same here. Kai-Heng > > > Kai-Heng > > > >> > >>> > >>>> > >>>> Bugzilla: > >>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugz > >>> illa.kernel.org%2Fshow_bug.cgi%3Fid%3D216448&data=05%7C01%7Cm > >>> ario.limonciello%40amd.com%7C1e27b1d6f69e42796c7b08da8f107121%7C3d > >>> d8961fe4884e608e11a82d994e183d%7C0%7C0%7C637979598042186185%7CU > >>> nknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI > >>> 6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=0lhcaKfUyoK > >>> 0FXT9uDZ8a%2Fpxs9tHd8aoQcyPFdB%2F0eY%3D&reserved=0 > >>>> Signed-off-by: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx> > >>>> --- > >>>> drivers/thunderbolt/nhi.c | 29 +++++++++++++++++++++++++++++ > >>>> drivers/thunderbolt/switch.c | 6 ++++++ > >>>> drivers/thunderbolt/tb.c | 1 + > >>>> drivers/thunderbolt/tb.h | 5 +++++ > >>>> include/linux/thunderbolt.h | 1 + > >>>> 5 files changed, 42 insertions(+) > >>>> > >>>> diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c > >>>> index cb8c9c4ae93a2..75f5ce5e22978 100644 > >>>> --- a/drivers/thunderbolt/nhi.c > >>>> +++ b/drivers/thunderbolt/nhi.c > >>>> @@ -1225,6 +1225,8 @@ static int nhi_probe(struct pci_dev *pdev, const > >>> struct pci_device_id *id) > >>>> { > >>>> struct tb_nhi *nhi; > >>>> struct tb *tb; > >>>> + struct pci_dev *p = NULL; > >>>> + struct tb_pci_bridge *pci_bridge, *n; > >>>> int res; > >>>> > >>>> if (!nhi_imr_valid(pdev)) { > >>>> @@ -1306,6 +1308,19 @@ static int nhi_probe(struct pci_dev *pdev, const > >>> struct pci_device_id *id) > >>>> nhi_shutdown(nhi); > >>>> return res; > >>>> } > >>>> + > >>>> + if (pdev->vendor == PCI_VENDOR_ID_AMD) { > >>>> + while ((p = pci_get_device(PCI_VENDOR_ID_AMD, 0x14cd, > >>> p))) { > >>>> + pci_bridge = kmalloc(sizeof(struct tb_pci_bridge), > >>> GFP_KERNEL); > >>>> + if (!pci_bridge) > >>>> + goto cleanup; > >>>> + > >>>> + pci_bridge->bridge = p; > >>>> + INIT_LIST_HEAD(&pci_bridge->list); > >>>> + list_add(&pci_bridge->list, &tb->bridge_list); > >>>> + } > >>>> + } > >>> > >>> You can't walk the device tree and create a "shadow" list of devices > >>> like this and expect any lifetime rules to work properly with them at > >>> all. > >>> > >>> Please do not do this. > >>> > >>> greg k-h >