Re: [PATCH] PCI: Add quirk to clear MSI-X

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Mar 09, 2023 at 12:32:41PM -0600, Limonciello, Mario wrote:
> On 3/9/2023 12:25, Bjorn Helgaas wrote:
> ...

> > > > https://gitlab.freedesktop.org/agd5f/linux/-/commit/07494a25fc8881e122c242a46b5c53e0e4403139
> > 
> > That nbio_v7.2.c patch and this patch don't look anything alike.  It
> > looks like the nbio_v7.2.c patch might run once?  Could *this* be done
> > once at enumeration-time, too?
> 
> They don't look anything alike because they're attacking the problem from
> different angles.

Why do we need different angles?

> The NBIO patch fixes the initialization value for the internal registers.
> This is what the BIOS "should" have done.  When the internal registers are
> configured properly then the behavior the kernel expects works as well.
> 
> The NBIO patch will run both at amdgpu startup as well as when resuming from
> suspend.

If initializing something as BIOS should have done makes the hardware
work correctly, isn't once enough?  Why does the NBIO patch need to
run at resume-time?

> This patch we're discussing treats the symptoms of the deficiency and avoids
> the impact.
> This patch runs any time the controller is runtime resumed.  So, yes it will
> run more frequently.  Because this patch is treating the symptoms it needs
> to be applied every single time the controller exits D3.

This patch runs at *suspend*-time (DECLARE_PCI_FIXUP_SUSPEND), not
resume-time.

The difference is important because with this broken BIOS, MSI-X is
disabled between the suspend quirk and some distant point in resume.
With non-broken BIOS, MSI-X remains *enabled* for at least part of
that period, and I don't want to have to figure out whether that
difference is important.

We have fragments of a coherent commit log, but it's not quite a
complete story yet.  I think so far we have:

  - Issue affects only the 1022:15b8 USB controller (well, I guess it
    also affects some GPU device?)
  - Only a problem when BIOS doesn't initialize controller correctly
  - Controller claims to preserve internal state on D3hot->D0
    transition, but it doesn't
  - D0->D3hot->D0 transitions do preserve external PCI_MSIX_FLAGS
    state; only internal state is lost
  - When MSI-X is enabled and controller transitions D0->D3hot->D0,
    MSI-X appears enabled per PCI_MSIX_FLAGS, but is actually
    *disabled* because the internal state was lost
  - MSI-X being disabled leads to xhci_hcd command timeouts because
    interrupts are missed
  - Not possible for an enumeration-time quirk to fix the controller
    initialization problem (why not?)
  - Writing PCI_MSIX_FLAGS with a *different* value fixes the internal
    state; writing the same value does nothing
  - A suspend- or resume-time quirk can work around this, and this is
    safe on *all* 1022:15b8 devices regardless of whether the BIOS is
    broken
  - The same approach can't be used for both 1022:15b8 and the GPU
    device because ...?

Bjorn



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux