On Wednesday, February 27, 2019 1:31:59 PM CET Lukas Wunner wrote: > On Wed, Feb 27, 2019 at 07:52:16PM +0800, Dongdong Liu wrote: > > We do some test to trigger hotplug while do sysfs "remove" operation, > > then met a deadlock issue. > > > > pciehp 0000:00:0c.0:pcie004: Slot(0-1): Link Up > > echo 1 > 0000\:00\:0c.0/remove > > > > PME and hotplug share an MSI/MSI-X vector. > > The sysfs "remove" operation will call the below function. > > remove_store() > > pci_stop_and_remove_bus_device_locked() > > pci_lock_rescan_remove() > > pci_stop_and_remove_bus_device() > > ... > > pcie_pme_remove() > > synchronize_irq() // Here will wait for hotplug IRQ handler > > pci_unlock_rescan_remove(); > > This is (at least) a bug in the PME driver. I'm not familiar with > that driver, so adding Keith Busch to cc. > > The bug is that pcie_pme_remove() invokes pcie_pme_suspend(), which > calls synchronize_irq(), which waits for the hardirq handlers and > IRQ threads of all drivers sharing the IRQ to finish. > > pcie_pme_remove() then calls free_irq() which *again* waits for a > running hardirq handler and thread to finish, however it does not > wait for those of drivers sharing the IRQ to finish since 4.19, > see commit 519cc8652b3a ("genirq: Synchronize only with single thread > on free_irq()"). > > I think the invocation of synchronize_irq() in pcie_pme_suspend() is > unnecessary when called from pcie_pme_remove(), it's been there since > the driver was added by Rafael with commit c7f486567c1d ("PCI PM: PCIe > PME root port service driver"), adding him to cc as well. Thanks! AFAICS, the only part of pcie_pme_suspend() that needs to be done on remove is the one under data->lock. I'll cut a patch to make that change. Cheers, Rafael