On Thu, Apr 29, 2021 at 07:29:59PM +0800, Yicong Yang wrote: > On 2021/4/28 22:40, Lukas Wunner wrote: > > If DPC doesn't recover within 3 seconds, pciehp will consider the > > error unrecoverable and bring down the slot, no matter what. > > > > I can't tell you why DPC is unable to recover. Does it help if you > > raise the timeout to, say, 5000 msec? > > I raise the timeout to 4s and it works well. I dump the remained jiffies in > the log and find sometimes the recovery will take a bit more than 3s: Thanks for testing. I'll respin the patch and raise the timeout to 4000 msec. The 3000 msec were chosen arbitrarily. I couldn't imagine that it would ever take longer than that. The spec does not seem to mandate a time limit for DPC recovery. But we do need a timeout because the DPC Trigger Status bit may never clear and then pciehp would wait indefinitely. This can happen if dpc_wait_rp_inactive() fails or perhaps because the hardware is buggy. I'll amend the patch to clarify that the timeout is just a reasonable heuristic and not a value provided by the spec. Which hardware did you test this on? Is this a HiSilicon platform or Intel? Thanks! Lukas