[Public] > -----Original Message----- > From: Bjorn Helgaas <helgaas@xxxxxxxxxx> > Sent: Wednesday, March 8, 2023 16:44 > To: Natikar, Basavaraj <Basavaraj.Natikar@xxxxxxx> > Cc: bhelgaas@xxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; Limonciello, Mario > <Mario.Limonciello@xxxxxxx>; thomas@xxxxxxxxxxxx > Subject: Re: [PATCH] PCI: Add quirk to clear MSI-X > > Let's mention the vendor and device name in the subject to make the > log more useful. > > On Mon, Mar 06, 2023 at 12:53:40PM +0530, Basavaraj Natikar wrote: > > One of the AMD USB controllers fails to maintain internal functional > > context when transitioning from D3 to D0, desynchronizing MSI-X bits. > > As a result, add a quirk to this controller to clear the MSI-X bits > > on suspend. > > Is this a documented erratum? Please include a citation if so. > > Are there any other AMD USB devices with the same defect? FYI - it's not a hardware defect, it's a BIOS defect. > > The quick clears the Function Mask bit, so the MSI-X vectors may be > *unmasked* depending on the state of each vectors Mask bit. I assume > the potential unmasking is safe because you also clear the MSI-X > Enable bit, so the function can't use MSI-X at all. > > All state is lost in D3cold, so I guess this problem must occur during > a D3hot to D0 transition, right? I assume this device sets > No_Soft_Reset, so the function is supposed to return to D0active with > all internal state intact. But this device returns to D0active with > the MSI-X internal state corrupted? > > I assume this relies on pci_restore_state() to restore the MSI-X > state. Seems like that might be enough to restore the internal state > even without this quirk, but I guess it must not be. The important part is the register value changing to make the internal hardware move. Because it restores identically it doesn't change the internal hardware. > > > Note: This quirk works in all scenarios, regardless of whether the > > integrated GPU is disabled in the BIOS. > > I don't know how the integrated GPU is related to this USB controller, > but I assume this fact is important somehow? This bug is due to a BIOS bug with the initialization. We also posted in parallel a different workaround that fixes the initialization to match what the BIOS should have set via the GPU driver. It should be going in for 6.3-rc2. https://gitlab.freedesktop.org/agd5f/linux/-/commit/07494a25fc8881e122c242a46b5c53e0e4403139 But because these are desktop processors, users can decide in BIOS setup whether the integrated GPU should be enabled or disabled and that workaround won't work if it's disabled. > > > Cc: Mario Limonciello <mario.limonciello@xxxxxxx> > > Reported-by: Thomas Glanzmann <thomas@xxxxxxxxxxxx> > > Link: https://lore.kernel.org/linux- > usb/Y%2Fz9GdHjPyF2rNG3@xxxxxxxxxxxx/T/#u > > Apparently the symptom is one of these: > > xhci_hcd 0000:0c:00.0: Error while assigning device slot ID: Command > Aborted > xhci_hcd 0000:0c:00.0: Max number of devices this xHCI host supports is 64. > usb usb1-port1: couldn't allocate usb_device > xhci_hcd 0000:0c:00.0: WARN: xHC save state timeout > xhci_hcd 0000:0c:00.0: PM: suspend_common(): > xhci_pci_suspend+0x0/0x150 [xhci_pci] returns -110 > xhci_hcd 0000:0c:00.0: can't suspend (hcd_pci_runtime_suspend [usbcore] > returned -110) > > We should include the critical line or two in the commit log to help > users find the fix. > > I see this must be xhci_suspend() returning -ETIMEDOUT after > xhci_save_registers(), but I don't see the connection from there to a > PCI_FIXUP_SUSPEND. Can you connect the dots for me? > > > Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@xxxxxxx> > > --- > > drivers/pci/quirks.c | 10 ++++++++++ > > 1 file changed, 10 insertions(+) > > > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > > index 44cab813bf95..ddf7100227d3 100644 > > --- a/drivers/pci/quirks.c > > +++ b/drivers/pci/quirks.c > > @@ -6023,3 +6023,13 @@ > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x9a2d, > dpc_log_size); > > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x9a2f, > dpc_log_size); > > DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x9a31, > dpc_log_size); > > #endif > > + > > +static void quirk_clear_msix(struct pci_dev *dev) > > +{ > > + u16 ctrl; > > + > > + pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, > &ctrl); > > + ctrl &= ~(PCI_MSIX_FLAGS_MASKALL | PCI_MSIX_FLAGS_ENABLE); > > + pci_write_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, > ctrl); > > +} > > +DECLARE_PCI_FIXUP_SUSPEND(PCI_VENDOR_ID_AMD, 0x15b8, > quirk_clear_msix); > > -- > > 2.25.1 > >