Re: [PATCHv2 16/20] PCI/pciehp: Implement error handling callbacks

Lukas Wunner <lukas@xxxxxxxxx> · Mon, 10 Sep 2018 19:08:48 +0200

On Mon, Sep 10, 2018 at 10:45:28AM -0600, Keith Busch wrote:
> On Mon, Sep 10, 2018 at 06:09:26PM +0200, Lukas Wunner wrote:
> > On Mon, Sep 10, 2018 at 08:56:42AM -0600, Keith Busch wrote:
> > > The sysfs entries still function. Their actions are only temporarily
> > > stalled during error handling. Once the slot reset is called, the
> > > ctrl->pending_events is queried to take requested actions.
> > 
> > Okay I see.  Still, releasing the IRQ and requesting it again seems fairly
> > heavy-wheight.  Why not just acquire ctrl->reset_lock?
> 
> That was looking like a nice way to handle it, but it introduces
> circular locking between ctrl->reset_lock and pci_bus_sem:
> 
> CPU A                               CPU B
> ---------------------------------   ------------------------
> pci_walk_bus                        pciehp_ist
>  down_read(&pci_bus_sem)             down_read(&ctrl->reset_lock);
>   pcie_portdrv_error_detected         pciehp_handle_presence_or_link_change
>    pciehp_error_detected               pciehp_unconfigure_device
>     down_write(&ctrl->reset_lock)       pci_stop_and_remove_bus_devicea
>                                          down_write(&pci_bus_sem);

Why is pciehp bringing down the slot?  Is that in reaction to a
Link Down caused by the error?  Can this be solved with Sinan's
approach to check in pciehp whether PCI_EXP_DEVSTA_FED is set
and if so, waiting for it to be handled?

FWIW you can use synchronize_irq() in pciehp_error_detected()
if you need to wait for the IRQ thread to stop before taking
the reset_lock.

Thanks,

Lukas