On Tue, Aug 21, 2018 at 11:07:31AM -0400, Sinan Kaya wrote: > On 8/21/2018 10:37 AM, Keith Busch wrote: > > The actions associated with error recovery will trigger link state changes > > for a lot of existing hardware. PCIEHP currently does the same removal > > sequence for both link state change (DLLSC) and presence detect change > > (PDC) events. > > > > It sounds like you want pciehp to do nothing on the DLLSC events that it > > currently handles, and instead do the board removal only on PDC. If that > > is the case, is the desire to not remove devices downstream a permanently > > disabled link, or does that responsibility fall onto some other component? > > > > Looking at PDC is not enough. Hotplug driver handles both physical > removal as well as the link going down due to signal integrity issues today. > > If the link went down because of a pending FATAL error, AER/DPC recovers > the link automatically. There is no need for hotplug to be involved in > fatal error work. > > Hotplug driver needs to handle both physical removal as well as intermittent > link down issues though. Back to your patch you linked to earlier, your proposal is to have pciehp wait for DEVSTS.FED before deciding if it needs to handle the DLLSC event. That might be a start, but it isn't enough since that status isn't set if the downstream device reported ERR_FATAL. I think you'd need to check the secondary status register for a Received System Error.