Re: [PATCH v2] mmc: sdhci-pci-gli: GL975x: Mask rootport's replay timer timeout during suspend

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 21, 2023 at 11:21:47AM +0800, Kai-Heng Feng wrote:
> Spamming `lspci -vv` can still observe the replay timer timeout error
> even after commit 015c9cbcf0ad ("mmc: sdhci-pci-gli: GL9750: Mask the
> replay timer timeout of AER"), albeit with a lower reproduce rate.

I'm not sure what this is telling me.  By "spamming `lspci -vv`, do
you mean that if you run lspci continually, you still see Replay Timer
Timeout logged, e.g.,

  CESta:	... Timeout+

015c9cbcf0ad uses hard-coded PCI_GLI_9750_CORRERR_MASK offset and
PCI_GLI_9750_CORRERR_MASK_REPLAY_TIMER_TIMEOUT value, which look like
they *could* be PCI_ERR_COR_MASK and PCI_ERR_COR_REP_TIMER, but
without the lspci output I can't tell for sure.  If they are, it would
be nice to use the generic macros instead of defining new ones so it's
easier to analyze PCI_ERR_COR_MASK usage.

If 015c9cbcf0ad is updating the generic PCI_ERR_COR_MASK, it should
only prevent sending ERR_COR.  It should not affect the *logging* in
PCI_ERR_COR_STATUS (see PCIe r6.0, sec 6.2.3.2.2), so it shouldn't
affect the lspci output.

> Such AER interrupt can still prevent the system from suspending, so let
> root port mask and unmask replay timer timeout during suspend and
> resume, respectively.

015c9cbcf0ad looks like it masks PCI_ERR_COR_REP_TIMER in the gl975x
Endpoint, while this patch masks it in the upstream bridge (which
might be either a Root Port or a Switch Downstream Port, so the
subject and this sentence are not quite right).

015c9cbcf0ad says it is related to a hardware defect, and maybe this
patch is also (mention it if so).  Both patches can prevent ERR_COR
messages and the eventual AER interrupt, depending on whether the
Downstream Port or the Endpoint detects the Replay Timer Timeout.
Maybe this should have a Fixes: tag for 015c9cbcf0ad to try to keep
these together?

If 015c9cbcf0ad is actually updating PCI_ERR_COR_MASK, it would be
nice to clean that up, too.  And maybe PCI_ERR_COR_REP_TIMER should be
masked/restored at the same place for both the Downstream Port and the
Endpoint?

> +#ifdef CONFIG_PCIEAER
> +static void mask_replay_timer_timeout(struct pci_dev *pdev)
> +{
> +	struct pci_dev *parent = pci_upstream_bridge(pdev);
> +	u32 val;
> +
> +	if (!parent || !parent->aer_cap)
> +		return;
> +
> +	pci_read_config_dword(parent, parent->aer_cap + PCI_ERR_COR_MASK, &val);
> +	val |= PCI_ERR_COR_REP_TIMER;
> +	pci_write_config_dword(parent, parent->aer_cap + PCI_ERR_COR_MASK, val);
> +}
> +
> +static void unmask_replay_timer_timeout(struct pci_dev *pdev)
> +{
> +	struct pci_dev *parent = pci_upstream_bridge(pdev);
> +	u32 val;
> +
> +	if (!parent || !parent->aer_cap)
> +		return;
> +
> +	pci_read_config_dword(pdev, parent->aer_cap + PCI_ERR_COR_MASK, &val);
> +	val &= ~PCI_ERR_COR_REP_TIMER;

I think I would save the previous PCI_ERR_COR_REP_TIMER value and
restore it here, so it is preserved if there is ever a generic
interface via sysfs or similar to manage correctable error masking.

> +	pci_write_config_dword(pdev, parent->aer_cap + PCI_ERR_COR_MASK, val);
> +}




[Index of Archives]     [Linux Memonry Technology]     [Linux USB Devel]     [Linux Media]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux