Hello, > Keith reports a use-after-free when a DPC event occurs concurrently to > hot-removal of the same portion of the hierarchy: > > The dpc_handler() awaits readiness of the secondary bus below the > Downstream Port where the DPC event occurred. To do so, it polls the > config space of the first child device on the secondary bus. If that > child device is concurrently removed, accesses to its struct pci_dev > cause the kernel to oops. > > That's because pci_bridge_wait_for_secondary_bus() neglects to hold a > reference on the child device. Before v6.3, the function was only > called on resume from system sleep or on runtime resume. Holding a > reference wasn't necessary back then because the pciehp IRQ thread > could never run concurrently. (On resume from system sleep, IRQs are > not enabled until after the resume_noirq phase. And runtime resume is > always awaited before a PCI device is removed.) > > However starting with v6.3, pci_bridge_wait_for_secondary_bus() is also > called on a DPC event. Commit 53b54ad074de ("PCI/DPC: Await readiness > of secondary bus after reset"), which introduced that, failed to > appreciate that pci_bridge_wait_for_secondary_bus() now needs to hold a > reference on the child device because dpc_handler() and pciehp may > indeed run concurrently. The commit was backported to v5.10+ stable > kernels, so that's the oldest one affected. > > Add the missing reference acquisition. > > Abridged stack trace: > > BUG: unable to handle page fault for address: 00000000091400c0 > CPU: 15 PID: 2464 Comm: irq/53-pcie-dpc 6.9.0 > RIP: pci_bus_read_config_dword+0x17/0x50 > pci_dev_wait() > pci_bridge_wait_for_secondary_bus() > dpc_reset_link() > pcie_do_recovery() > dpc_handler() Applied to dpc, thank you! [1/1] PCI/DPC: Fix use-after-free on concurrent DPC and hot-removal https://git.kernel.org/pci/pci/c/11a1f4bc4736 Krzysztof