From: Sinan Kaya > Sent: 16 November 2017 14:04 ... > The issue is two independent software entities are trying to recover the PCIe > link simultaneously. AER and DPC have two different approaches to link recovery. > > AER makes a callback into the endpoint drivers for non-fatal errors and hope > that endpoint driver can recover the link. AER also makes a callback in the > fatal error case but resets the link via secondary bus reset. > > The DPC on the other hand stops the drivers immediately since HW took care of > link disable. (Endpoint register reads return ~0 at this point.) What happens if the 'user' driver doesn't define the error reporting callbacks? It might be hardened against the ~0u returns from reads - so not OOPS. It might be appropriate to call the remove() function instead. > DPC driver clears > the interrupt from the DPC capability and brings the link up at the end. Full > enumeration/rescan follows this procedure to go back to functioning state. That might not be a good idea, very likely it will fail again immediately. David