On Wed, Sep 21, 2022 at 01:03:26PM -0500, Bjorn Helgaas wrote: > On Wed, Sep 21, 2022 at 06:40:20AM -0500, Bjorn Helgaas wrote: > > https://bugzilla.kernel.org/show_bug.cgi?id=216511 [...] > Here's the call chain when handling that DLL state change: > > pciehp_ist > pcie_capability_read_word(pdev, PCI_EXP_SLTSTA, &status) > status &= ... PCI_EXP_SLTSTA_DLLSC > events |= status > if (events & PCI_EXP_SLTSTA_DLLSC) > pciehp_handle_presence_or_link_change > pciehp_disable_slot > __pciehp_disable_slot > remove_board > pciehp_unconfigure_device > pci_stop_and_remove_bus_device > > Per spec, "software must read the Data Link Layer Link Active bit of > the Link Status Register to determine if the Link is active before > initiating configuration cycles to the hot plugged device" (PCIe r6.0, > sec 7.5.3.11). > > It looks like Linux depends on PCI_EXP_SLTSTA_DLLSC but does not > actually read PCI_EXP_LNKSTA in this path, so this looks like a pciehp > defect. I disagree. The spec citation pertains to *bringup* of the slot, but this is the bringdown code path. The logic in pciehp is such that if we receive DLLSC or PDC and the slot is up, we always bring it down. Only then do we check whether the slot is occupied or link is up. If that's the case, we attempt to bring the slot up again. pciehp assumes that the card may have changed when it receives DLLSC or PDC. That's the rationale behind this behavior. In theory one might think that if DLLSC is received without a concurrent PDC event, then the card in the slot is still the same and only the link went down (probably flapped). Unfortunately the reality is not that simple: For one, DLLSC and PDC events may come in arbitrary order and with quite a delay between them. Second, there are broken slots which hardwire PDC to 0 and we support those. So we can't reliably determine if presence hasn't changed and only link has. In this particular case, the PEX switch is clearly broken because it shouldn't signal DLLSC both for a slot where the link change occurred and its sibling. A while ago Jon Derrick submitted a patch for a similar problem: A bifurcated SSD where bringing down one half of the SSD results in a spurious DLLSC event for the other half: https://lore.kernel.org/linux-pci/20210830155628.130054-1-jonathan.derrick@xxxxxxxxx/ I'm not really happy with that patch because it adds a quirk in the middle of the code path for interpreting slot events which makes it difficult to reason about the code's correctness. I'm starting to wonder if instead of Jon's patch, we should just disable DLLSC events on broken devices such as this PEX switch or Jon's SSD. We'd only rely on PDC then but that's probably sufficient. And the code changes would be less intrusive. FWIW, Jon is still interested in upstreaming his quirk: https://lore.kernel.org/linux-pci/446a21e2-aea2-773f-ca88-b6676b54b292@xxxxxxxxx/ @Richard: I think Jon's patch doesn't solve your issue does it? Because I think the issue he's seeing is slightly different albeit likewise caused by unreliable DLLSC. (His pertains to bringdown, yours to bringup of the slot it seems.) Thanks, Lukas