On Tue, Jan 05, 2021 at 11:07:23PM +0000, Kelley, Sean V wrote: > > On Jan 5, 2021, at 10:33 AM, Keith Busch <kbusch@xxxxxxxxxx> wrote: > > On Tue, Jan 05, 2021 at 04:06:53PM +0100, Hinko Kocevar wrote: > >> On 1/5/21 3:21 PM, Hinko Kocevar wrote: > >>> On 1/5/21 12:02 AM, Keith Busch wrote: > >>>> Changes from v1: > >>>> > >>>> Added received Acks > >>>> > >>>> Split the kernel print identifying the port type being reset. > >>>> > >>>> Added a patch for the portdrv to ensure the slot_reset happens without > >>>> relying on a downstream device driver.. > >>>> > >>>> Keith Busch (5): > >>>> PCI/ERR: Clear status of the reporting device > >>>> PCI/AER: Actually get the root port > >>>> PCI/ERR: Retain status from error notification > >>>> PCI/AER: Specify the type of port that was reset > >>>> PCI/portdrv: Report reset for frozen channel > >> > >> I removed the patch 5/5 from this patch series, and after testing again, it > >> makes my setup recover from the injected error; same as observed with v1 > >> series. > > > > Thanks for the notice. Unfortunately that seems even more confusing to > > me right now. That patch shouldn't do anything to the devices or the > > driver's state; it just ensures a recovery path that was supposed to > > happen anyway. The stack trace says restoring the config space completed > > partially before getting stuck at the virtual channel capability, at > > which point it appears to be in an infinite loop. I'll try to look into > > it. The emulated devices I test with don't have the VC cap but I might > > have real devices that do. > > I’m not seeing the error either with V2 when testing with are-inject using RCECs and an associated RCiEP. Thank you, yes, I'm also not seeing a problem either on my end. The sighting is still concerning though, so I'll keep looking. I may have to request Hinko to try a debug patch to help narrow down where things have gone wrong if that's okay.