On Fri, Apr 30, 2021 at 04:47:54PM +0800, Yicong Yang wrote: > On 2021/4/30 3:42, Lukas Wunner wrote: > > The 3000 msec were chosen arbitrarily. I couldn't imagine that > > it would ever take longer than that. The spec does not seem to > > mandate a time limit for DPC recovery. But we do need a timeout > > because the DPC Trigger Status bit may never clear and then pciehp > > would wait indefinitely. This can happen if dpc_wait_rp_inactive() > > fails or perhaps because the hardware is buggy. > > The DPC recovery process will not be blocked indefinitely. What about > wait until the DPC process is finished or until the dpc_reset_link() > is finished? We can try to up the link if the DPC recovery failed. As I've indicated above, there's a condition when DPC never completes: According to the PCIe Base Spec, sec. 6.2.10, "After DPC has been triggered in a Root Port that supports RP Extensions for DPC, the Root Port may require some time to quiesce and clean up its internal activities, such as those associated with DMA read Requests. When the DPC Trigger Status bit is Set and the DPC RP Busy bit is Set, software must leave the Root Port in DPC until the DPC RP Busy bit reads 0b." The spec does not mandate a time limit until DPC RP Busy clears: "The DPC RP Busy bit is a means for hardware to indicate to software that the RP needs to remain in DPC containment while the RP does some internal cleanup and quiescing activities. While the details of these activities are implementation specific, the activities will typically complete within a few microseconds or less. However, under worst-case conditions such as those that might occur with certain internal errors in large systems, the busy period might extend substantially, possibly into multiple seconds." Thus, dpc_wait_rp_inactive() polls every 10 msec (for up to HZ, i.e. 1 sec) whether PCI_EXP_DPC_RP_BUSY has been cleared. If it does not clear within 1 sec, the function returns -EBUSY to its caller dpc_reset_link(). Note that according to the spec, we're not allowed to clear the Trigger Status bit as long as the DPC RP Busy bit is set, hence dpc_reset_link() errors out without clearing PCI_EXP_DPC_STATUS_TRIGGER. Because pciehp waits for that bit to clear, it may end up waiting indefinitely for DPC to complete. That's not acceptable, so we do need a timeout to allow pciehp to progress. > I noticed the hotplug interrupt arrives prior to the DPC and then the > wait begins. DPC will clear the Trigger Status in its irq thread. > So not all the time is elapsed by the hardware recovery, but also by > the software process. Considering it's in the irq thread, if we are > preempted and clear the status slower, then we may be timed out. That is correct. If the system is super busy then there's a chance that pciehp may time out because the DPC IRQ thread wasn't scheduled. So the timeout needs to be long enough to accommodate for that. Thanks, Lukas