On Tue, Mar 11, 2025 at 07:27:32AM +0100, Lukas Wunner wrote: > Hot-removal of nested PCI hotplug ports suffers from a long-standing > race condition which can lead to a deadlock: A parent hotplug port > acquires pci_lock_rescan_remove(), then waits for pciehp to unbind > from a child hotplug port. Meanwhile that child hotplug port tries to > acquire pci_lock_rescan_remove() as well in order to remove its own > children. > > The deadlock only occurs if the parent acquires pci_lock_rescan_remove() > first, not if the child happens to acquire it first. > > Several workarounds to avoid the issue have been proposed and discarded > over the years, e.g.: > > https://lore.kernel.org/r/4c882e25194ba8282b78fe963fec8faae7cf23eb.1529173804.git.lukas@xxxxxxxxx/ > > A proper fix is being worked on, but needs more time as it is nontrivial > and necessarily intrusive. > > Recent commit 9d573d19547b ("PCI: pciehp: Detect device replacement > during system sleep") provokes more frequent occurrence of the deadlock > when removing more than one Thunderbolt device during system sleep. > The commit sought to detect device replacement, but also triggered on > device removal. Differentiating reliably between replacement and > removal is impossible because pci_get_dsn() returns 0 both if the device > was removed, as well as if it was replaced with one lacking a Device > Serial Number. > > Avoid the more frequent occurrence of the deadlock by checking whether > the hotplug port itself was hot-removed. If so, there's no sense in > checking whether its child device was replaced. > > This works because the ->resume_noirq() callback is invoked in top-down > order for the entire hierarchy: A parent hotplug port detecting device > replacement (or removal) marks all children as removed using > pci_dev_set_disconnected() and a child hotplug port can then reliably > detect being removed. > > Fixes: 9d573d19547b ("PCI: pciehp: Detect device replacement during system sleep") > Reported-by: Kenneth Crudup <kenny@xxxxxxxxx> > Closes: https://lore.kernel.org/r/83d9302a-f743-43e4-9de2-2dd66d91ab5b@xxxxxxxxx/ > Reported-by: Chia-Lin Kao (AceLan) <acelan.kao@xxxxxxxxxxxxx> > Closes: https://lore.kernel.org/r/20240926125909.2362244-1-acelan.kao@xxxxxxxxxxxxx/ > Tested-by: Kenneth Crudup <kenny@xxxxxxxxx> > Tested-by: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx> Also, Reviewed-by: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx>