On Wed, Apr 28, 2021 at 06:08:02PM +0800, Yicong Yang wrote: > I've tested the patch on our board, but the hotplug will still be > triggered sometimes. > seems the hotplug doesn't find the link down event is caused by dpc. > Any further test I can do? > > mestuary:/$ [12508.408576] pcieport 0000:00:10.0: DPC: containment event, status:0x1f21 source:0x0000 > [12508.423016] pcieport 0000:00:10.0: DPC: unmasked uncorrectable error detected > [12508.434277] pcieport 0000:00:10.0: PCIe Bus Error: severity=Uncorrected (Non-Fatal), type=Transaction Layer, (Completer ID) > [12508.447651] pcieport 0000:00:10.0: device [19e5:a130] error status/mask=00008000/04400000 > [12508.458279] pcieport 0000:00:10.0: [15] CmpltAbrt (First) > [12508.467094] pcieport 0000:00:10.0: AER: TLP Header: 00000000 00000000 00000000 00000000 > [12511.152329] pcieport 0000:00:10.0: pciehp: Slot(0): Link Down Note that about 3 seconds pass between DPC trigger and hotplug link down (12508 -> 12511). That's most likely the 3 second timeout in my patch: + /* + * Need a timeout in case DPC never completes due to failure of + * dpc_wait_rp_inactive(). + */ + wait_event_timeout(dpc_completed_waitqueue, dpc_completed(pdev), + msecs_to_jiffies(3000)); If DPC doesn't recover within 3 seconds, pciehp will consider the error unrecoverable and bring down the slot, no matter what. I can't tell you why DPC is unable to recover. Does it help if you raise the timeout to, say, 5000 msec? Thanks, Lukas