On Tue, Sep 04, 2018 at 08:16:02AM -0600, Keith Busch wrote: > On Sun, Sep 02, 2018 at 04:27:14PM +0200, Lukas Wunner wrote: > > On Fri, Aug 31, 2018 at 03:26:37PM -0600, Keith Busch wrote: > > > This patch adds a channel state to a subordinate bus. When a DPC event is > > > triggered, the DPC driver will set the channel state to frozen, and the > > > pciehp driver will ignore link events if the subordinate bus is being > > > managed by DPC error handling. > > > > > > This is safe because the pciehp and DPC drivers share the same > > > interrupt. The DPC driver sets the bus state in the top-half interrupt > > > context, and the pciehp driver checks and masks off link events in its > > > bottom-half error handler. > > > > I really liked Sinan's approach of checking in pciehp whether a fatal > > error is pending and waiting for it to be handled: > > https://patchwork.ozlabs.org/patch/959464/ > > > > This seemed to avoid any races with DPC and is small and simple. > > Can we pursue a solution along those lines? > > That introduces a completely different race between the error handling > and hotplug threads. We don't control which interrupt fires first or > any way ensure they're even the same event. pciehp may react quicker than dpc, hence needs to determine a fatal error is pending without relying on dpc. My understanding is that this is achieved by Sinan checking PCI_EXP_DEVSTA_FED directly from pciehp. For the case when dpc reacts quicker and clears the error before pciehp checks for PCI_EXP_DEVSTA_FED, you need an additional synchronization mechanism between dpc and pciehp, such as a flag that is set by dpc before clearing the error, and that is checked by pciehp. Though you need to take care that pciehp does not see a stale flag when the next error occurs. Thanks, Lukas