[PATCH] PCI: pciehp: Avoid unnecessary device replacement check

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hot-removal of nested PCI hotplug ports suffers from a long-standing
race condition which can lead to a deadlock:  A parent hotplug port
acquires pci_lock_rescan_remove(), then waits for pciehp to unbind
from a child hotplug port.  Meanwhile that child hotplug port tries to
acquire pci_lock_rescan_remove() as well in order to remove its own
children.

The deadlock only occurs if the parent acquires pci_lock_rescan_remove()
first, not if the child happens to acquire it first.

Several workarounds to avoid the issue have been proposed and discarded
over the years, e.g.:

https://lore.kernel.org/r/4c882e25194ba8282b78fe963fec8faae7cf23eb.1529173804.git.lukas@xxxxxxxxx/

A proper fix is being worked on, but needs more time as it is nontrivial
and necessarily intrusive.

Recent commit 9d573d19547b ("PCI: pciehp: Detect device replacement
during system sleep") provokes more frequent occurrence of the deadlock
when removing more than one Thunderbolt device during system sleep.
The commit sought to detect device replacement, but also triggered on
device removal.  Differentiating reliably between replacement and
removal is impossible because pci_get_dsn() returns 0 both if the device
was removed, as well as if it was replaced with one lacking a Device
Serial Number.

Avoid the more frequent occurrence of the deadlock by checking whether
the hotplug port itself was hot-removed.  If so, there's no sense in
checking whether its child device was replaced.

This works because the ->resume_noirq() callback is invoked in top-down
order for the entire hierarchy:  A parent hotplug port detecting device
replacement (or removal) marks all children as removed using
pci_dev_set_disconnected() and a child hotplug port can then reliably
detect being removed.

Fixes: 9d573d19547b ("PCI: pciehp: Detect device replacement during system sleep")
Reported-by: Kenneth Crudup <kenny@xxxxxxxxx>
Closes: https://lore.kernel.org/r/83d9302a-f743-43e4-9de2-2dd66d91ab5b@xxxxxxxxx/
Reported-by: Chia-Lin Kao (AceLan) <acelan.kao@xxxxxxxxxxxxx>
Closes: https://lore.kernel.org/r/20240926125909.2362244-1-acelan.kao@xxxxxxxxxxxxx/
Tested-by: Kenneth Crudup <kenny@xxxxxxxxx>
Tested-by: Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx>
Signed-off-by: Lukas Wunner <lukas@xxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx # v6.11+
---
 drivers/pci/hotplug/pciehp_core.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/hotplug/pciehp_core.c b/drivers/pci/hotplug/pciehp_core.c
index ff458e6..997841c 100644
--- a/drivers/pci/hotplug/pciehp_core.c
+++ b/drivers/pci/hotplug/pciehp_core.c
@@ -286,9 +286,12 @@ static int pciehp_suspend(struct pcie_device *dev)
 
 static bool pciehp_device_replaced(struct controller *ctrl)
 {
-	struct pci_dev *pdev __free(pci_dev_put);
+	struct pci_dev *pdev __free(pci_dev_put) = NULL;
 	u32 reg;
 
+	if (pci_dev_is_disconnected(ctrl->pcie->port))
+		return false;
+
 	pdev = pci_get_slot(ctrl->pcie->port->subordinate, PCI_DEVFN(0, 0));
 	if (!pdev)
 		return true;
-- 
2.43.0





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux