On Sun, Aug 06, 2023 at 06:44:50AM +0200, Lukas Wunner wrote: > The Broadcom Set Top Box PCIe controller signals an Asynchronous SError > Interrupt and thus causes a kernel panic when non-posted transactions > time out. This differs from most other PCIe controllers which return a > fabricated "all ones" response instead. > > To avoid gratuitous kernel panics, the driver reads the link status from > a proprietary PCIE_MISC_PCIE_STATUS register and skips downstream > accesses if the link is down. > > However the bits in the proprietary register may purport that the link > is up even though link training is still in progress (as indicated by > the Link Training bit in the Link Status register). > > This has been observed with various PCIe switches attached to a BCM2711 > (Raspberry Pi CM4): The issue is most pronounced with the Pericom > PI7C9X2G404SV, but has also occasionally been witnessed with the Pericom > PI7C9X2G404SL and ASMedia ASM1184e. So somebody is seeing kernel panics when these switches are connected? Do we have pointers to those reports that we can reference here? > Check the Link Training bit in addition to the PCIE_MISC_PCIE_STATUS > register before performing downstream accesses. I guess the theory is that link training takes longer than usual with these devices? Is the idea here that we wait longer in brcm_pcie_start_link()? Or is it that we avoid config accesses to downstream devices while the link is not yet up? This seems like it would be problematic (see below). > Signed-off-by: Lukas Wunner <lukas@xxxxxxxxx> > Cc: <stable@xxxxxxxxxxxxxxx> > --- > drivers/pci/controller/pcie-brcmstb.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/drivers/pci/controller/pcie-brcmstb.c b/drivers/pci/controller/pcie-brcmstb.c > index f593a422bd63..b4abfced8e9b 100644 > --- a/drivers/pci/controller/pcie-brcmstb.c > +++ b/drivers/pci/controller/pcie-brcmstb.c > @@ -679,8 +679,10 @@ static bool brcm_pcie_link_up(struct brcm_pcie *pcie) > u32 val = readl(pcie->base + PCIE_MISC_PCIE_STATUS); > u32 dla = FIELD_GET(PCIE_MISC_PCIE_STATUS_PCIE_DL_ACTIVE_MASK, val); > u32 plu = FIELD_GET(PCIE_MISC_PCIE_STATUS_PCIE_PHYLINKUP_MASK, val); > + u16 lnksta = readw(pcie->base + BRCM_PCIE_CAP_REGS + PCI_EXP_LNKSTA); > + u16 lt = FIELD_GET(PCI_EXP_LNKSTA_LT, lnksta); > > - return dla && plu; > + return dla && plu && !lt; It looks like this will make config accesses to downstream devices fail while PCI_EXP_LNKSTA_LT is set by making brcm_pcie_link_up() return false, which makes brcm_pcie_map_bus() return NULL, which will make pci_generic_config_read() return PCIBIOS_DEVICE_NOT_FOUND without attempting the config read. So this should avoid the SError (mostly, at least; I'm sure this is still racy), but what about the config access? Presumably the caller depends on it happening, and it sounds like it *would* happen if we tried a little later. I don't think we can count on the caller to retry a failed access, e.g., enumeration config reads that return ~0 are just interpreted as "there's no device here." Maybe the real issue is that we need to make brcm_pcie_start_link() wait longer, where we aren't attempting a config read? Jim, are you still interested in testing this? > } > > static void __iomem *brcm_pcie_map_bus(struct pci_bus *bus, > -- > 2.39.2 >