Re: [PATCH v2] mmc: sdhci-pci-gli: GL975x: Mask rootport's replay timer timeout during suspend

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Mar 26, 2024 at 09:52:28AM +0800, Kai-Heng Feng wrote:
> On Tue, Mar 26, 2024 at 3:02 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > On Mon, Mar 25, 2024 at 10:02:27AM +0800, Kai-Heng Feng wrote:
> > > On Sat, Mar 23, 2024 at 12:43 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> ...

> > > > If that's the case, why do the
> > > > masking in the suspend/resume callbacks?
> > >
> > > Because there's no functional impact when the error happens, other
> > > than suspend/resume.
> >
> > Oh, I think I see.  Is this accurate?
> >
> >   Due to a hardware defect in GL975x, config accesses when ASPM is
> >   enabled frequently cause Replay Timer Timeouts in the Port leading
> >   to the device.
> >
> >   These are Correctable Errors, so the Downstream Port logs it in its
> >   PCI_ERR_COR_STATUS and, when the error is not masked, sends an
> >   ERR_COR message upstream.  The message terminates at a Root Port,
> >   which may generate an AER interrupt so the OS can log it.
> >
> >   The Correctable Error logging is an annoyance but normally not a
> >   major issue.  But when the AER interrupt happens during suspend, it
> >   can prevent the system from suspending.
> 
> That's totally the case here.
> 
> This brings up another different but related topic  - should the port
> driver disable AER/DPC IRQ during suspend?
> We've discussed this many times, I still think that's the right
> approach to "quiesce" many unexpected errors during system state
> transition.

Maybe so.  We can continue that in the context of that patch.  Maybe
it needs to be reposted; I can't remember where it's at right now.

> > I think we should log a hint in dmesg that we're masking
> > PCI_ERR_COR_REP_TIMER because the error will still be logged in the
> > PCI_ERR_COR_STATUS register, and that will be visible via lspci, and a
> > dmesg hint will save debugging time when people report that.
> 
> Sure. Where do you think it's a better place to implement the quirk? I
> Assume PCI quirk is a better place than driver's probe routine?

Yes, I think drivers/pci/quirks.c is a better place so we can mask it
even if the driver isn't loaded.  Users can still run lspci and see
these errors even if the driver isn't loaded.

Bjorn




[Index of Archives]     [Linux Memonry Technology]     [Linux USB Devel]     [Linux Media]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux