Re: [PATCH] PCI: Add quirk to clear MSI-X

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/10/2023 6:27 AM, Mario Limonciello wrote:
> On 3/9/23 16:30, Bjorn Helgaas wrote:
>> On Thu, Mar 09, 2023 at 12:32:41PM -0600, Limonciello, Mario wrote:
>>> On 3/9/2023 12:25, Bjorn Helgaas wrote:
>>> ...
>>
>>>>>> https://gitlab.freedesktop.org/agd5f/linux/-/commit/07494a25fc8881e122c242a46b5c53e0e4403139
>>>>>>
>>>>
>>>> That nbio_v7.2.c patch and this patch don't look anything alike.  It
>>>> looks like the nbio_v7.2.c patch might run once?  Could *this* be done
>>>> once at enumeration-time, too?
>>>
>>> They don't look anything alike because they're attacking the problem
>>> from
>>> different angles.
>>
>> Why do we need different angles?
>
> The GPU driver approach only works if the GPU is enabled.  If the GPU
> could never be disabled then it alone would be sufficient.
>
>>
>>> The NBIO patch fixes the initialization value for the internal
>>> registers.
>>> This is what the BIOS "should" have done.  When the internal
>>> registers are
>>> configured properly then the behavior the kernel expects works as well.
>>>
>>> The NBIO patch will run both at amdgpu startup as well as when
>>> resuming from
>>> suspend.
>>
>> If initializing something as BIOS should have done makes the hardware
>> work correctly, isn't once enough?  Why does the NBIO patch need to
>> run at resume-time?
>
> During suspend some internal registers are in a power domain that the
> state will be lost.  These are typically restored by the BIOS to the
> values defined in initialization tables before handing control back to
> the OS.
>
>
>>
>>> This patch we're discussing treats the symptoms of the deficiency
>>> and avoids
>>> the impact.
>>> This patch runs any time the controller is runtime resumed.  So, yes
>>> it will
>>> run more frequently.  Because this patch is treating the symptoms it
>>> needs
>>> to be applied every single time the controller exits D3.
>>
>> This patch runs at *suspend*-time (DECLARE_PCI_FIXUP_SUSPEND), not
>> resume-time.
>>
>> The difference is important because with this broken BIOS, MSI-X is
>> disabled between the suspend quirk and some distant point in resume.
>> With non-broken BIOS, MSI-X remains *enabled* for at least part of
>> that period, and I don't want to have to figure out whether that
>> difference is important.
>
> I'll let Basavaraj comment on the timing here with the behavior
> workaround and sequence of events.

As replied in the previous mail, Bjron's suggestion works well and holds good so I will
change the quirk to apply in resume instead of suspend which also resolves the issue
as below i.e. restoring during resume if MSI-X is enabled works.

static void quirk_restore_msix_en(struct pci_dev *dev)
{
        u16 ctrl;

        pci_read_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, &ctrl);
        if (!(ctrl & PCI_MSIX_FLAGS_ENABLE))
                return;

        ctrl &= ~PCI_MSIX_FLAGS_ENABLE;
        pci_write_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, ctrl);
        ctrl |= PCI_MSIX_FLAGS_ENABLE;
        pci_write_config_word(dev, dev->msix_cap + PCI_MSIX_FLAGS, ctrl);
}
DECLARE_PCI_FIXUP_RESUME_EARLY(PCI_VENDOR_ID_AMD, 0x15b8, quirk_restore_msix_en);


Will change the commit message accordingly.

I guess for the questions below we already answered.
Please let us know if you need more clarifications.


Thanks,
--
Basavaraj

>
>>
>> We have fragments of a coherent commit log, but it's not quite a
>> complete story yet.  I think so far we have:
>>
>>    - Issue affects only the 1022:15b8 USB controller (well, I guess it
>>      also affects some GPU device?)
>
> Same device.  It's just a way to access the internal registers.
>
>>    - Only a problem when BIOS doesn't initialize controller correctly
>>    - Controller claims to preserve internal state on D3hot->D0
>>      transition, but it doesn't
>>    - D0->D3hot->D0 transitions do preserve external PCI_MSIX_FLAGS
>>      state; only internal state is lost
>>    - When MSI-X is enabled and controller transitions D0->D3hot->D0,
>>      MSI-X appears enabled per PCI_MSIX_FLAGS, but is actually
>>      *disabled* because the internal state was lost
>>    - MSI-X being disabled leads to xhci_hcd command timeouts because
>>      interrupts are missed
>>    - Not possible for an enumeration-time quirk to fix the controller
>>      initialization problem (why not?)
>>    - Writing PCI_MSIX_FLAGS with a *different* value fixes the internal
>>      state; writing the same value does nothing
>>    - A suspend- or resume-time quirk can work around this, and this is
>>      safe on *all* 1022:15b8 devices regardless of whether the BIOS is
>>      broken
>>    - The same approach can't be used for both 1022:15b8 and the GPU
>>      device because ...?
>>
>> Bjorn
>




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux