Re: [PATCH v2] mmc: sdhci-pci-gli: GL975x: Mask rootport's replay timer timeout during suspend

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On Sat, Jan 13, 2024 at 1:37 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> On Fri, Jan 12, 2024 at 01:14:42PM +0800, Kai-Heng Feng wrote:
> > On Sat, Jan 6, 2024 at 5:19 AM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > >
> > > On Thu, Dec 21, 2023 at 11:21:47AM +0800, Kai-Heng Feng wrote:
> > > > Spamming `lspci -vv` can still observe the replay timer timeout error
> > > > even after commit 015c9cbcf0ad ("mmc: sdhci-pci-gli: GL9750: Mask the
> > > > replay timer timeout of AER"), albeit with a lower reproduce rate.
> > >
> > > I'm not sure what this is telling me.  By "spamming `lspci -vv`, do
> > > you mean that if you run lspci continually, you still see Replay Timer
> > > Timeout logged, e.g.,
> > >
> > >   CESta:        ... Timeout+
> >
> > Yes it's logged and the AER IRQ is raised.
> IIUC the AER IRQ is the important thing.
> Neither 015c9cbcf0ad nor this patch affects logging in
> PCI_ERR_COR_STATUS, so the lspci output won't change and mentioning it
> here doesn't add useful information.

You are right. That's just a way to access config space to reproduce the issue.

> I'd suggest more specific wording than "spamming `lspci -vv`", e.g.,
>   015c9cbcf0ad ("mmc: sdhci-pci-gli: GL9750: Mask the replay timer
>   timeout of AER") masks Replay Timer Timeout errors at the GL975x
>   Endpoint.  When the Endpoint detects these errors, it still logs
>   them in its PCI_ERR_COR_STATUS, but masking prevents it from sending
>   ERR_COR messages upstream.
>   The Downstream Port leading to a GL975x Endpoint is unaffected by
>   015c9cbcf0ad.  Previously, when that Port detected a Replay Timer
>   Timeout, it sent an ERR_COR message upstream, which eventually
>   caused an AER IRQ, which prevented the system from suspending.
>   Mask Replay Timer Timeout errors at the Downstream Port.  The errors
>   will still be logged in PCI_ERR_COR_STATUS, but no ERR_COR will be
>   sent.

That's phrased much better then mine :)

> > > 015c9cbcf0ad uses hard-coded PCI_GLI_9750_CORRERR_MASK offset and
> > > PCI_GLI_9750_CORRERR_MASK_REPLAY_TIMER_TIMEOUT value, which look like
> > > they *could* be PCI_ERR_COR_MASK and PCI_ERR_COR_REP_TIMER, but
> > > without the lspci output I can't tell for sure.  If they are, it would
> > > be nice to use the generic macros instead of defining new ones so it's
> > > easier to analyze PCI_ERR_COR_MASK usage.
> > >
> > > If 015c9cbcf0ad is updating the generic PCI_ERR_COR_MASK, it should
> > > only prevent sending ERR_COR.  It should not affect the *logging* in
> > > PCI_ERR_COR_STATUS (see PCIe r6.0, sec, so it shouldn't
> > > affect the lspci output.
> >
> > PCI_GLI_9750_CORRERR_MASK is specific to GLI 975x devices, so it
> > doesn't conform to generic PCI_ERR_COR_STATUS behavior.
> *Could* 015c9cbcf0ad have used the generic PCI_ERR_COR_MASK to
> accomplish the same effect?  Is there an advantage to using the
> device-specific PCI_GLI_9750_CORRERR_MASK?
> If masking via PCI_ERR_COR_MASK would work, that would be much better
> because the PCI core can see, manage, and make that visible, e.g., via
> sysfs.  The core doesn't do that today, but people are working on it.

I don't think so. Please see below.

> > > If 015c9cbcf0ad is actually updating PCI_ERR_COR_MASK, it would be
> > > nice to clean that up, too.  And maybe PCI_ERR_COR_REP_TIMER should be
> > > masked/restored at the same place for both the Downstream Port and the
> > > Endpoint?
> >
> > Since PCI_ERR_COR_REP_TIMER is already masked before 015c9cbcf0ad,
> > so I didn't think that's necessary.  Do you think it should still be
> > masked just to be safe?
> Did you mean "PCI_ERR_COR_REP_TIMER is already masked *by*
> 015c9cbcf0ad"?

No. The PCI_ERR_COR_REP_TIMER is masked with or without 015c9cbcf0ad.
That means before 015c9cbcf0ad, Reply Timeout error was reported with

So using PCI_GLI_9750_CORRERR_MASK is necessary for the endpoint.

> If masking PCI_ERR_COR_REP_TIMER using the generic PCI_ERR_COR_MASK in
> the GL975x would have the same effect as masking it with
> PCI_GLI_9750_CORRERR_MASK, then I think you should *only* use the
> generic PCI_ERR_COR_MASK.
> No need to do both if the generic one is sufficient.  And I think both
> should be done in the same place since they're basically solving the
> same problem, just at both ends of the link.

Do you mean only mask PCI_GLI_9750_CORRERR_MASK during suspend?
That will not be ideal because accessing its config space (e.g. `lspci
-vv`) will have many errors logged.


> Bjorn

[Index of Archives]     [Linux Memonry Technology]     [Linux USB Devel]     [Linux Media]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux