On Fri, Jun 25, 2021 at 03:38:41PM -0500, stuart hayes wrote: > I have a system that is failing to recover after an EDR event with (or > without...) this patch. It looks like the problem is similar to what this > patch is trying to fix, except that on my system, the hotplug port is > downstream of the root port that has DPC, so the "link down" event on it is > not being ignored. So the hotplug code disables the slot (which contains an > NVMe device on this system) while the nvme driver is trying to use it, which > results in a failed recovery and another EDR event, and the kernel ends up > with the DPC trigger status bit set in the root port, so everything > downstream is gone. > > I added the hack below so the hotplug code will ignore the "link down" > events on the ports downstream of the root port during DPC recovery, and it > recovers no problem. (I'm not proposing this as a correct fix.) Could you test if the below patch fixes the issue? Note, this is a hack as well, but I can turn it into a proper patch if it works as expected. Thanks! Lukas -- >8 -- diff --git a/drivers/pci/pcie/portdrv_pci.c b/drivers/pci/pcie/portdrv_pci.c index c7ff1eea225a..893c7ae1a54d 100644 --- a/drivers/pci/pcie/portdrv_pci.c +++ b/drivers/pci/pcie/portdrv_pci.c @@ -160,6 +160,10 @@ static pci_ers_result_t pcie_portdrv_error_detected(struct pci_dev *dev, static pci_ers_result_t pcie_portdrv_slot_reset(struct pci_dev *dev) { + if (dev->is_hotplug_bridge) + pcie_capability_write_word(dev, PCI_EXP_SLTSTA, + PCI_EXP_SLTSTA_DLLSC); + pci_restore_state(dev); pci_save_state(dev); return PCI_ERS_RESULT_RECOVERED;