Re: [PATCHv2 0/5] aer handling fixups

"Kelley, Sean V" <sean.v.kelley@xxxxxxxxx> · Tue, 5 Jan 2021 23:07:23 +0000

> On Jan 5, 2021, at 10:33 AM, Keith Busch <kbusch@xxxxxxxxxx> wrote:
> 
> On Tue, Jan 05, 2021 at 04:06:53PM +0100, Hinko Kocevar wrote:
>> On 1/5/21 3:21 PM, Hinko Kocevar wrote:
>>> On 1/5/21 12:02 AM, Keith Busch wrote:
>>>> Changes from v1:
>>>> 
>>>>    Added received Acks
>>>> 
>>>>    Split the kernel print identifying the port type being reset.
>>>> 
>>>>    Added a patch for the portdrv to ensure the slot_reset happens without
>>>>    relying on a downstream device driver..
>>>> 
>>>> Keith Busch (5):
>>>>    PCI/ERR: Clear status of the reporting device
>>>>    PCI/AER: Actually get the root port
>>>>    PCI/ERR: Retain status from error notification
>>>>    PCI/AER: Specify the type of port that was reset
>>>>    PCI/portdrv: Report reset for frozen channel
>> 
>> I removed the patch 5/5 from this patch series, and after testing again, it
>> makes my setup recover from the injected error; same as observed with v1
>> series.
> 
> Thanks for the notice. Unfortunately that seems even more confusing to
> me right now. That patch shouldn't do anything to the devices or the
> driver's state; it just ensures a recovery path that was supposed to
> happen anyway. The stack trace says restoring the config space completed
> partially before getting stuck at the virtual channel capability, at
> which point it appears to be in an infinite loop. I'll try to look into
> it. The emulated devices I test with don't have the VC cap but I might
> have real devices that do.

I’m not seeing the error either with V2 when testing with are-inject using RCECs and an associated RCiEP.

Sean