Re: [PATCH] PCI: brcmstb: Avoid downstream access during link training

Bjorn Helgaas <helgaas@xxxxxxxxxx> · Mon, 23 Oct 2023 19:51:36 -0500

On Sun, Aug 06, 2023 at 06:44:50AM +0200, Lukas Wunner wrote:
> The Broadcom Set Top Box PCIe controller signals an Asynchronous SError
> Interrupt and thus causes a kernel panic when non-posted transactions
> time out.  This differs from most other PCIe controllers which return a
> fabricated "all ones" response instead.
> 
> To avoid gratuitous kernel panics, the driver reads the link status from
> a proprietary PCIE_MISC_PCIE_STATUS register and skips downstream
> accesses if the link is down.
> 
> However the bits in the proprietary register may purport that the link
> is up even though link training is still in progress (as indicated by
> the Link Training bit in the Link Status register).
> 
> This has been observed with various PCIe switches attached to a BCM2711
> (Raspberry Pi CM4):  The issue is most pronounced with the Pericom
> PI7C9X2G404SV, but has also occasionally been witnessed with the Pericom
> PI7C9X2G404SL and ASMedia ASM1184e.

So somebody is seeing kernel panics when these switches are connected?
Do we have pointers to those reports that we can reference here?

> Check the Link Training bit in addition to the PCIE_MISC_PCIE_STATUS
> register before performing downstream accesses.

I guess the theory is that link training takes longer than usual with
these devices?  Is the idea here that we wait longer in
brcm_pcie_start_link()?

Or is it that we avoid config accesses to downstream devices while the
link is not yet up?  This seems like it would be problematic (see
below).

> Signed-off-by: Lukas Wunner <lukas@xxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx>
> ---
>  drivers/pci/controller/pcie-brcmstb.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c
> index f593a422bd63..b4abfced8e9b 100644
> --- a/drivers/pci/controller/pcie-brcmstb.c
> +++ b/drivers/pci/controller/pcie-brcmstb.c
> @@ -679,8 +679,10 @@ static bool brcm_pcie_link_up(struct brcm_pcie *pcie)
>  	u32 val = readl(pcie->base + PCIE_MISC_PCIE_STATUS);
>  	u32 dla = FIELD_GET(PCIE_MISC_PCIE_STATUS_PCIE_DL_ACTIVE_MASK, val);
>  	u32 plu = FIELD_GET(PCIE_MISC_PCIE_STATUS_PCIE_PHYLINKUP_MASK, val);
> +	u16 lnksta = readw(pcie->base + BRCM_PCIE_CAP_REGS + PCI_EXP_LNKSTA);
> +	u16 lt = FIELD_GET(PCI_EXP_LNKSTA_LT, lnksta);
>  
> -	return dla && plu;
> +	return dla && plu && !lt;

It looks like this will make config accesses to downstream devices
fail while PCI_EXP_LNKSTA_LT is set by making brcm_pcie_link_up()
return false, which makes brcm_pcie_map_bus() return NULL, which will
make pci_generic_config_read() return PCIBIOS_DEVICE_NOT_FOUND without
attempting the config read.

So this should avoid the SError (mostly, at least; I'm sure this is
still racy), but what about the config access?  Presumably the caller
depends on it happening, and it sounds like it *would* happen if we
tried a little later.  I don't think we can count on the caller to
retry a failed access, e.g., enumeration config reads that return ~0
are just interpreted as "there's no device here."

Maybe the real issue is that we need to make brcm_pcie_start_link()
wait longer, where we aren't attempting a config read?

Jim, are you still interested in testing this?

>  }
>  
>  static void __iomem *brcm_pcie_map_bus(struct pci_bus *bus,
> -- 
> 2.39.2
>