On Wed, May 01, 2024 at 11:50:47PM +0200, Lukas Wunner wrote: > On Wed, May 01, 2024 at 07:51:18AM -0700, Keith Busch wrote: > > After a link reset, the Broadcom / LSI PEX890xx PCIe Gen 5 Switch in synth > > mode will temporarily insert a fake place-holder device, 1000 02b2, before > > the link is actually active for the expected downstream device. Confirm > > the device's identifier matches what we expect before moving forward. > > Otherwise, the pciehp driver may unmask hotplug notifications before > > the link is actually active, which triggers an undesired device removal. > > This won't work if the device was hot-swapped with a different one > and thus correctly returns a different Vendor/Device ID. We'd wait > for the device to report the previous device's Vendor/Device ID, > which doesn't make sense. > > It would be possible to raise d3cold_delay in struct pci_dev for > children of affected Broadcom switches. Have you considered that > as a potential solution? Good point, there's more paths I need to consider here. The path this is addressing is through pciehp's reset_slot handling, which temporarily disables the link change and presence detection. In the error scenario, the secondary bus reset completes too quickly, which re-enables the pciehp events before the downstream device has settled. Once it settles, that triggers a Link Change/PDC event, then we lose our device. I briefly considered a quirk for d3cold_delay, but I was hoping for something more programatic than adding an arbitrary delay. That might be okay though.