RE: [PATCH] PCI: Add quirk to clear MSI-X

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Public]



> -----Original Message-----
> From: Bjorn Helgaas <helgaas@xxxxxxxxxx>
> Sent: Monday, March 20, 2023 12:15
> To: Limonciello, Mario <Mario.Limonciello@xxxxxxx>
> Cc: Natikar, Basavaraj <Basavaraj.Natikar@xxxxxxx>;
> bhelgaas@xxxxxxxxxx; linux-pci@xxxxxxxxxxxxxxx; thomas@xxxxxxxxxxxx;
> Rafael J. Wysocki <rjw@xxxxxxxxxxxxx>
> Subject: Re: [PATCH] PCI: Add quirk to clear MSI-X
> 
> [+cc Rafael for RESUME_EARLY quirk question]
> 
> On Mon, Mar 20, 2023 at 01:32:16AM +0000, Limonciello, Mario wrote:
> > > -----Original Message-----
> > > From: Bjorn Helgaas <helgaas@xxxxxxxxxx>
> > > Sent: Friday, March 10, 2023 16:14
> > > To: Limonciello, Mario <Mario.Limonciello@xxxxxxx>
> > > Cc: Natikar, Basavaraj <Basavaraj.Natikar@xxxxxxx>; Natikar, Basavaraj
> > > <Basavaraj.Natikar@xxxxxxx>; bhelgaas@xxxxxxxxxx; linux-
> > > pci@xxxxxxxxxxxxxxx; thomas@xxxxxxxxxxxx
> > > Subject: Re: [PATCH] PCI: Add quirk to clear MSI-X
> > >
> > > On Thu, Mar 09, 2023 at 06:57:38PM -0600, Mario Limonciello wrote:
> > > > On 3/9/23 16:30, Bjorn Helgaas wrote:
> > > > > On Thu, Mar 09, 2023 at 12:32:41PM -0600, Limonciello, Mario wrote:
> > > > > > On 3/9/2023 12:25, Bjorn Helgaas wrote:
> > > > > > ...
> > > > >
> > > > > > > > > https://gitlab.freedesktop.org/agd5f/linux/-
> > > /commit/07494a25fc8881e122c242a46b5c53e0e4403139
> > > > > > >
> > > > > > > That nbio_v7.2.c patch and this patch don't look anything
> > > > > > > alike.  It looks like the nbio_v7.2.c patch might run
> > > > > > > once?  Could *this* be done once at enumeration-time, too?
> > > > > >
> > > > > > They don't look anything alike because they're attacking the
> > > > > > problem from different angles.
> > > > >
> > > > > Why do we need different angles?
> > > >
> > > > The GPU driver approach only works if the GPU is enabled.  If
> > > > the GPU could never be disabled then it alone would be
> > > > sufficient.
> > > >
> > > > > > The NBIO patch fixes the initialization value for the
> > > > > > internal registers.  This is what the BIOS "should" have
> > > > > > done.  When the internal registers are configured properly
> > > > > > then the behavior the kernel expects works as well.
> > > > > >
> > > > > > The NBIO patch will run both at amdgpu startup as well as
> > > > > > when resuming from suspend.
> > > > >
> > > > > If initializing something as BIOS should have done makes the
> > > > > hardware work correctly, isn't once enough?  Why does the NBIO
> > > > > patch need to run at resume-time?
> > > >
> > > > During suspend some internal registers are in a power domain
> > > > that the state will be lost.  These are typically restored by
> > > > the BIOS to the values defined in initialization tables before
> > > > handing control back to the OS.
> > >
> > > I don't quite get this.  I thought I read that if BIOS had
> > > initialized the hardware correctly, a D0->D3hot->D0 transition
> > > would work without any issues.  Linux can do this with PMCSR
> > > writes and BIOS isn't involved at all.
> >
> > During a suspend transition not all registers are powered.  Firmware
> > will capture some during the suspend transition and restore some of
> > them for the resume transition, but it's up to the firmware whether
> > this one is included.
> >
> > Furthermore most IP blocks in amdgpu typically initialize the same
> > during both startup and resume to ensure that firmware couldn't have
> > mucked with the expected golden state.
> 
> We're spending way more time on this than makes sense, but I do think

Yeah..

> it's important that the commit log is accurate and makes sense even to
> people who don't know the internals of the device.
> 
> It *sounds* like what's happening is:
> 
>   - OS writes PMCSR to put device in D3hot
>   - BIOS traps D0->D3hot transition via something like SMI and
>     captures MSI-X state
>   - Device enters D3hot
>   - Device internal MSI-X state is lost
>   - BIOS traps D3hot->D0 transition via SMI
>   - Device enters D0
>   - BIOS restores MSI-X state
>   - OS resumes use of device
> 
> If that's what's happening, the fact that the device loses the
> internal state in D3hot sounds like a *hardware* defect -- if you put
> the device in a system without a BIOS, the D0->D3hot->D0 transitions
> would not work as required by the PCIe spec.

Actually it's a controller integrated into the APU.

So any system you put this APU into has a BIOS.  Because it's a socketed
APU people can very easily move it from one motherboard to another and one
vendor may have the BIOS properly configuring but another might not.

> 
> We can call the fact that BIOS lacks the MSI-X save/restore a BIOS
> defect, but the only reason BIOS would *need* that save/restore is
> because of the underlying *hardware* defect.
> 
> If that's the case, I would expect a commit log something like this:
> 
>   The AMD [1022:15b8] USB controller loses some internal functional
>   MSI-X context when transitioning from D0 to D3hot.  BIOS normally
>   traps D0->D3hot and D3hot->D0 transitions so it can save and restore
>   that internal context, but some firmware in the field lacks this
>   workaround.

I wouldn't call it a workaround.  The hardware is doing exactly as it's
intended for how the firmware programmed.

> 
>   If MSI-X is enabled, toggle the PCI_MSIX_FLAGS_ENABLE bit when
>   resuming to D0, which resynchronizes the internal state that was
>   lost in D3hot.

Otherwise the commit message sounds good to me.

> 
> Rafael, do we run the DECLARE_PCI_FIXUP_RESUME_EARLY quirks for *all*
> D3hot->D0 transitions?
> 
> I'm concerned about places like pci_pm_reset(), where we do
> D0->D3hot->D0 to do the reset.  Or vfio_pm_config_write(), where it
> looks like a guest could do that without running the quirk.
> 
> Current proposed patch is:
> https://lore.kernel.org/r/ddbbfb50-24b6-202f-7452-c8959901c739@xxxxxxx
> 
> Bjorn




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux