On Wed, Aug 22, 2018 at 09:06:57AM +1000, Benjamin Herrenschmidt wrote: > On Tue, 2018-08-21 at 16:04 -0600, Keith Busch wrote: > > > I think there need to be some coordination between pciehb and DPC on > > > link state change yes. > > > > > > We could still remove the device if recovery fails. For example on EEH > > > we have a threshold and if a device fails more than N times within the > > > last M minutes (I don't remember the exact numbers and I'm not in front > > > of the code right now) we give up. > > > > > > Also under some circumstances, the link will change as a result of the > > > error handling doing a hot reset. > > > > > > For example, if the error happens in a parent bridge and that gets > > > reset, the entire hierarchy underneath does too. > > > > > > We need to save/restore the state of all bridges and devices (BARs > > > etc...) below that. > > > > That's not good enough. You'd at least need to check SLTSTS.PDC on all > > bridge DSP's beneath the parent *before* you try to restore the state > > of devices beneath them. Those races are primarily why DPC currently > > removes and reenumerates instead, because that's not something that can > > be readily coordinated today. > > It can be probably done by a simple test & skip as you go down > restoring state, then handling the removals after the dance is > complete. You can't just test the states in config space. You're racing with pciehp's ISR, which clears those states. You'd need to fundamentally change pciehp to (1) not process such events until some currently undefined criteria is established, and (2) save and export the previously latched states while pciehp bottom half processing is stalled.