Re: [PATCH] PCI: pciehp: Ignore Link Down/Up caused by DPC

Lukas Wunner <lukas@xxxxxxxxx> · Fri, 30 Apr 2021 14:15:49 +0200

On Fri, Apr 30, 2021 at 04:47:54PM +0800, Yicong Yang wrote:
> On 2021/4/30 3:42, Lukas Wunner wrote:
> > The 3000 msec were chosen arbitrarily.  I couldn't imagine that
> > it would ever take longer than that.  The spec does not seem to
> > mandate a time limit for DPC recovery.  But we do need a timeout
> > because the DPC Trigger Status bit may never clear and then pciehp
> > would wait indefinitely.  This can happen if dpc_wait_rp_inactive()
> > fails or perhaps because the hardware is buggy.
> 
> The DPC recovery process will not be blocked indefinitely. What about
> wait until the DPC process is finished or until the dpc_reset_link()
> is finished? We can try to up the link if the DPC recovery failed.

As I've indicated above, there's a condition when DPC never completes:

According to the PCIe Base Spec, sec. 6.2.10,

   "After DPC has been triggered in a Root Port that supports RP Extensions
    for DPC, the Root Port may require some time to quiesce and clean up
    its internal activities, such as those associated with DMA read Requests.
    When the DPC Trigger Status bit is Set and the DPC RP Busy bit is Set,
    software must leave the Root Port in DPC until the DPC RP Busy bit
    reads 0b."

The spec does not mandate a time limit until DPC RP Busy clears:

   "The DPC RP Busy bit is a means for hardware to indicate to software
    that the RP needs to remain in DPC containment while the RP does
    some internal cleanup and quiescing activities. While the details
    of these activities are implementation specific, the activities will
    typically complete within a few microseconds or less. However, under
    worst-case conditions such as those that might occur with certain
    internal errors in large systems, the busy period might extend
    substantially, possibly into multiple seconds."

Thus, dpc_wait_rp_inactive() polls every 10 msec (for up to HZ, i.e. 1 sec)
whether PCI_EXP_DPC_RP_BUSY has been cleared.  If it does not clear
within 1 sec, the function returns -EBUSY to its caller dpc_reset_link().
Note that according to the spec, we're not allowed to clear the
Trigger Status bit as long as the DPC RP Busy bit is set, hence
dpc_reset_link() errors out without clearing PCI_EXP_DPC_STATUS_TRIGGER.

Because pciehp waits for that bit to clear, it may end up waiting
indefinitely for DPC to complete.  That's not acceptable, so we do
need a timeout to allow pciehp to progress.

> I noticed the hotplug interrupt arrives prior to the DPC and then the
> wait begins. DPC will clear the Trigger Status in its irq thread.
> So not all the time is elapsed by the hardware recovery, but also by
> the software process. Considering it's in the irq thread, if we are
> preempted and clear the status slower, then we may be timed out.

That is correct.  If the system is super busy then there's a chance
that pciehp may time out because the DPC IRQ thread wasn't scheduled.
So the timeout needs to be long enough to accommodate for that.

Thanks,

Lukas