On Tue, Jan 14, 2025 at 08:25:04PM +0200, Ilpo Järvinen wrote: > On Tue, 14 Jan 2025, Jiwei wrote: > > [ 539.362400] ==== pcie_bwnotif_irq 269(stop running),link_status:0x7841 > > [ 539.395720] ==== pcie_bwnotif_irq 247(start running),link_status:0x1041 > > DLLLA=0 > > But LBMS did not get reset. > > So is this perhaps because hotplug cannot keep up with the rapid > remove/add going on, and thus will not always call the remove_board() > even if the device went away? > > Lukas, do you know if there's a good way to resolve this within hotplug > side? I believe the pciehp code is fine and suspect this is an issue in the quirk. We've been dealing with rapid add/remove in pciehp for years without issues. I don't understand the quirk sufficiently to make a guess what's going wrong, but I'm wondering if there could be a race accessing the lbms_count? Maybe if lbms_count is replaced by a flag in pci_dev->priv_flags as we've discussed, with proper memory barriers where necessary, this problem will solve itself? Thanks, Lukas