On Mon, Sep 10, 2018 at 10:45:28AM -0600, Keith Busch wrote: > On Mon, Sep 10, 2018 at 06:09:26PM +0200, Lukas Wunner wrote: > > On Mon, Sep 10, 2018 at 08:56:42AM -0600, Keith Busch wrote: > > > The sysfs entries still function. Their actions are only temporarily > > > stalled during error handling. Once the slot reset is called, the > > > ctrl->pending_events is queried to take requested actions. > > > > Okay I see. Still, releasing the IRQ and requesting it again seems fairly > > heavy-wheight. Why not just acquire ctrl->reset_lock? > > That was looking like a nice way to handle it, but it introduces > circular locking between ctrl->reset_lock and pci_bus_sem: > > CPU A CPU B > --------------------------------- ------------------------ > pci_walk_bus pciehp_ist > down_read(&pci_bus_sem) down_read(&ctrl->reset_lock); > pcie_portdrv_error_detected pciehp_handle_presence_or_link_change > pciehp_error_detected pciehp_unconfigure_device > down_write(&ctrl->reset_lock) pci_stop_and_remove_bus_devicea > down_write(&pci_bus_sem); Why is pciehp bringing down the slot? Is that in reaction to a Link Down caused by the error? Can this be solved with Sinan's approach to check in pciehp whether PCI_EXP_DEVSTA_FED is set and if so, waiting for it to be handled? FWIW you can use synchronize_irq() in pciehp_error_detected() if you need to wait for the IRQ thread to stop before taking the reset_lock. Thanks, Lukas