RE: [PATCH] PCI: Add quirk to clear MSI-X

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Public]



> -----Original Message-----
> From: Bjorn Helgaas <helgaas@xxxxxxxxxx>
> Sent: Wednesday, March 8, 2023 16:44
> To: Natikar, Basavaraj <Basavaraj.Natikar@xxxxxxx>
> Cc: bhelgaas@xxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; Limonciello, Mario
> <Mario.Limonciello@xxxxxxx>; thomas@xxxxxxxxxxxx
> Subject: Re: [PATCH] PCI: Add quirk to clear MSI-X
> 
> Let's mention the vendor and device name in the subject to make the
> log more useful.
> 
> On Mon, Mar 06, 2023 at 12:53:40PM +0530, Basavaraj Natikar wrote:
> > One of the AMD USB controllers fails to maintain internal functional
> > context when transitioning from D3 to D0, desynchronizing MSI-X bits.
> > As a result, add a quirk to this controller to clear the MSI-X bits
> > on suspend.
> 
> Is this a documented erratum?  Please include a citation if so.
> 
> Are there any other AMD USB devices with the same defect?

FYI - it's not a hardware defect, it's a BIOS defect.

> 
> The quick clears the Function Mask bit, so the MSI-X vectors may be
> *unmasked* depending on the state of each vectors Mask bit.  I assume
> the potential unmasking is safe because you also clear the MSI-X
> Enable bit, so the function can't use MSI-X at all.
> 
> All state is lost in D3cold, so I guess this problem must occur during
> a D3hot to D0 transition, right?  I assume this device sets
> No_Soft_Reset, so the function is supposed to return to D0active with
> all internal state intact.  But this device returns to D0active with
> the MSI-X internal state corrupted?
> 
> I assume this relies on pci_restore_state() to restore the MSI-X
> state.  Seems like that might be enough to restore the internal state
> even without this quirk, but I guess it must not be.

The important part is the register value changing to make
the internal hardware move.  Because it restores identically it doesn't change
the internal hardware.

> 
> > Note: This quirk works in all scenarios, regardless of whether the
> > integrated GPU is disabled in the BIOS.
> 
> I don't know how the integrated GPU is related to this USB controller,
> but I assume this fact is important somehow?

This bug is due to a BIOS bug with the initialization.  We also posted in
parallel a different workaround that fixes the initialization to match what
the BIOS should have set via the GPU driver.  

It should be going in for 6.3-rc2.
https://gitlab.freedesktop.org/agd5f/linux/-/commit/07494a25fc8881e122c242a46b5c53e0e4403139

But because these are desktop processors, users can decide in BIOS setup
whether the integrated GPU should be enabled or disabled and that
workaround won't work if it's disabled.

> 
> > Cc: Mario Limonciello <mario.limonciello@xxxxxxx>
> > Reported-by: Thomas Glanzmann <thomas@xxxxxxxxxxxx>
> > Link: https://lore.kernel.org/linux-
> usb/Y%2Fz9GdHjPyF2rNG3@xxxxxxxxxxxx/T/#u
> 
> Apparently the symptom is one of these:
> 
>   xhci_hcd 0000:0c:00.0: Error while assigning device slot ID: Command
> Aborted
>   xhci_hcd 0000:0c:00.0: Max number of devices this xHCI host supports is 64.
>   usb usb1-port1: couldn't allocate usb_device
>   xhci_hcd 0000:0c:00.0: WARN: xHC save state timeout
>   xhci_hcd 0000:0c:00.0: PM: suspend_common():
> xhci_pci_suspend+0x0/0x150 [xhci_pci] returns -110
>   xhci_hcd 0000:0c:00.0: can't suspend (hcd_pci_runtime_suspend [usbcore]
> returned -110)
> 
> We should include the critical line or two in the commit log to help
> users find the fix.
> 
> I see this must be xhci_suspend() returning -ETIMEDOUT after
> xhci_save_registers(), but I don't see the connection from there to a
> PCI_FIXUP_SUSPEND.  Can you connect the dots for me?
> 
> > Signed-off-by: Basavaraj Natikar <Basavaraj.Natikar@xxxxxxx>
> > ---
> >  drivers/pci/quirks.c | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 44cab813bf95..ddf7100227d3 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -6023,3 +6023,13 @@
> DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x9a2d,
> dpc_log_size);
> >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x9a2f,
> dpc_log_size);
> >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x9a31,
> dpc_log_size);
> >  #endif
> > +
> > +static void quirk_clear_msix(struct pci_dev *dev)
> > +{
> > +	u16 ctrl;
> > +
> > +	pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS,
> &ctrl);
> > +	ctrl &= ~(PCI_MSIX_FLAGS_MASKALL | PCI_MSIX_FLAGS_ENABLE);
> > +	pci_write_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS,
> ctrl);
> > +}
> > +DECLARE_PCI_FIXUP_SUSPEND(PCI_VENDOR_ID_AMD, 0x15b8,
> quirk_clear_msix);
> > --
> > 2.25.1
> >




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux