On Tue, 2018-08-21 at 17:13 -0600, Keith Busch wrote: > On Wed, Aug 22, 2018 at 09:06:57AM +1000, Benjamin Herrenschmidt wrote: > > On Tue, 2018-08-21 at 16:04 -0600, Keith Busch wrote: > > > > I think there need to be some coordination between pciehb and DPC on > > > > link state change yes. > > > > > > > > We could still remove the device if recovery fails. For example on EEH > > > > we have a threshold and if a device fails more than N times within the > > > > last M minutes (I don't remember the exact numbers and I'm not in front > > > > of the code right now) we give up. > > > > > > > > Also under some circumstances, the link will change as a result of the > > > > error handling doing a hot reset. > > > > > > > > For example, if the error happens in a parent bridge and that gets > > > > reset, the entire hierarchy underneath does too. > > > > > > > > We need to save/restore the state of all bridges and devices (BARs > > > > etc...) below that. > > > > > > That's not good enough. You'd at least need to check SLTSTS.PDC on all > > > bridge DSP's beneath the parent *before* you try to restore the state > > > of devices beneath them. Those races are primarily why DPC currently > > > removes and reenumerates instead, because that's not something that can > > > be readily coordinated today. > > > > It can be probably done by a simple test & skip as you go down > > restoring state, then handling the removals after the dance is > > complete. > > You can't just test the states in config space. You're racing with > pciehp's ISR, which clears those states. Sure, I meant you'd call into pciehp.. For EEH we do have some linkage between hotplug and EEH, though a lot of it goes at the FW level for us. > You'd need to fundamentally > change pciehp to (1) not process such events until some currently > undefined criteria is established, and (2) save and export the previously > latched states while pciehp bottom half processing is stalled. Ben.