On Thu, Mar 4, 2021 at 2:38 PM Kuppuswamy, Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxx> wrote: > > > On 3/4/21 2:11 PM, Dan Williams wrote: > > On Thu, Mar 4, 2021 at 12:03 PM Keith Busch <kbusch@xxxxxxxxxx> wrote: > > On Tue, Mar 02, 2021 at 09:46:40PM -0800, Kuppuswamy, Sathyanarayanan wrote: > > On 3/2/21 9:34 PM, Williams, Dan J wrote: > > [ Add Sathya ] > > On Mon, 2021-01-04 at 15:02 -0800, Keith Busch wrote: > > Overwriting the frozen detected status with the result of the link reset > loses the NEED_RESET result that drivers are depending on for error > handling to report the .slot_reset() callback. Retain this status so > that subsequent error handling has the correct flow. > > Reported-by: Hinko Kocevar <hinko.kocevar@xxxxxx> > Acked-by: Sean V Kelley <sean.v.kelley@xxxxxxxxx> > Signed-off-by: Keith Busch <kbusch@xxxxxxxxxx> > > Just want to report that this fix might be a candidate for -stable. > > Agree. > > I think it can be merged in both stable and mainline kernels. > > Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx> > > Just FYI, this patch is practically a revert of this one: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit?id=6d2c89441571ea534d6240f7724f518936c44f8d > > so please let me know if that is still a problem for you. > > For what it's worth I think "6d2c89441571 PCI/ERR: Update error status > after reset_link()" is not justified. The link shouldn't recover if > the attached device is not prepared to handle DPC events. > > I added that fix to address the recovery issue seen in a Dell server > platform (for EDR test case). If I understand the history correctly, > In EDR case, AER and DPC is owned by firmware, hence we get > PCI_ERS_RESULT_NO_AER_DRIVER when executing error_detected() callbacks. > So If we continue the pcie_do_recovery() with PCI_ERS_RESULT_NO_AER_DRIVER > as error status, then even if we successfully reset the link we will report > the recovery status as failure. But that's the right response if there is no handler. If there is a device attached that was not prepared for the link to go down then it does not matter if the link comes back ok that device will be thoroughly confused and should be disconnected.