Hi On 6/8/2020 11:17 PM, Marcos Scriven wrote: > On Thu, 28 May 2020 at 09:12, Marcos Scriven <marcos@xxxxxxxxxxx> wrote: >> On Wed, 27 May 2020 at 22:42, Deucher, Alexander >> <Alexander.Deucher@xxxxxxx> wrote: >>> [AMD Official Use Only - Internal Distribution Only] >>> >>>> -----Original Message----- >>>> From: Bjorn Helgaas <helgaas@xxxxxxxxxx> >>>> Sent: Wednesday, May 27, 2020 5:32 PM >>>> To: Kevin Buettner <kevinb@xxxxxxxxxx> >>>> Cc: linux-pci@xxxxxxxxxxxxxxx; Bjorn Helgaas <bhelgaas@xxxxxxxxxx>; Alex >>>> Williamson <alex.williamson@xxxxxxxxxx>; Deucher, Alexander >>>> <Alexander.Deucher@xxxxxxx>; Koenig, Christian >>>> <Christian.Koenig@xxxxxxx> >>>> Subject: Re: [PATCH] PCI: Avoid FLR for AMD Starship USB 3.0 >>>> >>>> [+cc Alex D, Christian -- do you guys have any contacts or insight into why we >>>> suddenly have three new AMD devices that advertise FLR support but it >>>> doesn't work? Are we doing something wrong in Linux, or are these devices >>>> defective? >>> +Nehal who handles our USB drivers. >>> >>> Nehal any ideas about FLR or whether it should be advertised? >>> >>> Alex >>> Sorry for the delay. We are looking into this with BIOS team. I shall revert soon on this. >> I had read somewhere that the IO die in the Ryzen/Threadripper >> packages are identical to the ones used in the motherboard chipsets. >> >> Since the latter do reset ok, it would seem a BIOS update of the AGESA >> may potentially fix the issue. >> >> Unfortunately, it's not something motherboard manufacturer's customer >> support people know how to deal with or pass back up the chain to AMD >> engineers. Actual use of this feature seems to be fairly niche. >> >> After I added the workaround for the USB and audio controllers on the >> 3rd-gen Ryzen, I tried contacting Kim Phillips (who I found as a >> kernel committer to x86/cpu/amd), but haven't heard back. >> >> It would be wonderful to know if this can potentially be fixed in CPU >> firmware, and whether there's any likelihood of it actually being >> distributed by motherboard manufacturers. >> >> Marcos >> >> >> > Dear Alex/Nehal > > I wonder if you're able to comment please on whether FLR should be advertised? > > Is there any chance this could be fixed at the bios/AGESA level, and > effectively rolled out? > > Thanks > > Marcos > >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore. >>>> kernel.org%2Fr%2F20200524003529.598434ff%40f31- >>>> 4.lan&data=02%7C01%7Calexander.deucher%40amd.com%7Ccb77b56b >>>> 62ae47f60f8808d802855759%7C3dd8961fe4884e608e11a82d994e183d%7C0% >>>> 7C0%7C637262119015438912&sdata=3z%2Btn%2Bv2pvUl3X0Tzk%2BLoi >>>> Mk06dLZCmgUOrsGf3kLpY%3D&reserved=0 >>>> AMD Starship USB 3.0 host controller >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore. >>>> kernel.org%2Fr%2FCAAri2DpkcuQZYbT6XsALhx2e6vRqPHwtbjHYeiH7MNp4z >>>> mt1RA%40mail.gmail.com&data=02%7C01%7Calexander.deucher%40a >>>> md.com%7Ccb77b56b62ae47f60f8808d802855759%7C3dd8961fe4884e608e11 >>>> a82d994e183d%7C0%7C0%7C637262119015438912&sdata=69GsHB0HCp >>>> 6x0xW0tA%2FrAln0Vy0Yc9I8QSHowebdIxI%3D&reserved=0 >>>> AMD Matisse HD Audio & USB 3.0 host controller ] >>>> >>>> On Sun, May 24, 2020 at 12:35:29AM -0700, Kevin Buettner wrote: >>>>> This commit adds an entry to the quirk_no_flr table for the AMD >>>>> Starship USB 3.0 host controller. >>>>> >>>>> Tested on a Micro-Star International Co., Ltd. MS-7C59/Creator TRX40 >>>>> motherboard with an AMD Ryzen Threadripper 3970X. >>>>> >>>>> Without this patch, when attempting to assign (pass through) an AMD >>>>> Starship USB 3.0 host controller to a guest OS, the system becomes >>>>> increasingly unresponsive over the course of several minutes, >>>>> eventually requiring a hard reset. >>>>> >>>>> Shortly after attempting to start the guest, I see these messages: >>>>> >>>>> May 23 22:59:46 mesquite kernel: vfio-pci 0000:05:00.3: not ready >>>>> 1023ms after FLR; waiting May 23 22:59:48 mesquite kernel: vfio-pci >>>>> 0000:05:00.3: not ready 2047ms after FLR; waiting May 23 22:59:51 >>>>> mesquite kernel: vfio-pci 0000:05:00.3: not ready 4095ms after FLR; >>>>> waiting May 23 22:59:56 mesquite kernel: vfio-pci 0000:05:00.3: not >>>>> ready 8191ms after FLR; waiting >>>>> >>>>> And then eventually: >>>>> >>>>> May 23 23:01:00 mesquite kernel: vfio-pci 0000:05:00.3: not ready >>>>> 65535ms after FLR; giving up May 23 23:01:05 mesquite kernel: INFO: >>>>> NMI handler (perf_event_nmi_handler) took too long to run: 0.000 msecs >>>>> May 23 23:01:06 mesquite kernel: perf: interrupt took too long (642744 >>>>>> 2500), lowering kernel.perf_event_max_sample_rate to 1000 May 23 >>>>> 23:01:07 mesquite kernel: INFO: NMI handler (perf_event_nmi_handler) >>>>> took too long to run: 82.270 msecs May 23 23:01:08 mesquite kernel: INFO: >>>> NMI handler (perf_event_nmi_handler) took too long to run: 680.608 msecs >>>> May 23 23:01:08 mesquite kernel: INFO: NMI handler >>>> (perf_event_nmi_handler) took too long to run: 100.952 msecs ... >>>>> kernel:watchdog: BUG: soft lockup - CPU#3 stuck for 22s! >>>>> [qemu-system-x86:7487] May 23 23:01:25 mesquite kernel: watchdog: >>>> BUG: >>>>> soft lockup - CPU#3 stuck for 22s! [qemu-system-x86:7487] >>>>> >>>>> The above log snippets were obtained using the aforementioned hardware >>>>> running Fedora 32 w/ kernel package kernel-5.6.13-300.fc32.x86_64. My >>>>> fix was applied to a local copy of the F32 kernel package, then >>>>> rebuilt, etc. >>>>> >>>>> With this patch in place, the host kernel doesn't exhibit these >>>>> problems. The guest OS (also Fedora 32) starts up and works as >>>>> expected with the passed-through USB host controller. >>>>> >>>>> Signed-off-by: Kevin Buettner <kevinb@xxxxxxxxxx> >>>> Applied to pci/virtualization for v5.8, thanks! >>>> >>>>> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index >>>>> 43a0c2ce635e..b1db58d00d2b 100644 >>>>> --- a/drivers/pci/quirks.c >>>>> +++ b/drivers/pci/quirks.c >>>>> @@ -5133,6 +5133,7 @@ >>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x443, >>>> quirk_intel_qat_vf_cap); >>>>> * FLR may cause the following to devices to hang: >>>>> * >>>>> * AMD Starship/Matisse HD Audio Controller 0x1487 >>>>> + * AMD Starship USB 3.0 Host Controller 0x148c >>>>> * AMD Matisse USB 3.0 Host Controller 0x149c >>>>> * Intel 82579LM Gigabit Ethernet Controller 0x1502 >>>>> * Intel 82579V Gigabit Ethernet Controller 0x1503 @@ -5143,6 +5144,7 >>>>> @@ static void quirk_no_flr(struct pci_dev *dev) >>>>> dev->dev_flags |= PCI_DEV_FLAGS_NO_FLR_RESET; } >>>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x1487, quirk_no_flr); >>>>> +DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x148c, >>>> quirk_no_flr); >>>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_AMD, 0x149c, >>>> quirk_no_flr); >>>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1502, >>>> quirk_no_flr); >>>>> DECLARE_PCI_FIXUP_EARLY(PCI_VENDOR_ID_INTEL, 0x1503, >>>> quirk_no_flr); Regard Nehal Shah