What's the rationale for overriding the status returned by the err_detected callback with the reset_link in pcie_do_recovery? If the err_detected returned a NEED_RESET and the reset_link returned RECOVERED (like dpc_reset_link), then the slot_reset driver callback won't be called. pci_dbg(dev, "broadcast error_detected message\n"); if (state == pci_channel_io_frozen) { pci_walk_bus(bus, report_frozen_detected, &status); <-- returns RESET status = reset_link(dev); <--- call which returns RECOVERED if (status != PCI_ERS_RESULT_RECOVERED) { pci_warn(dev, "link reset failed\n"); goto failed; --James > -----Original Message----- > From: Keith Busch <kbusch@xxxxxxxxxx> > Sent: Tuesday, March 16, 2021 5:52 PM > To: James Puthukattukaran <james.puthukattukaran@xxxxxxxxxx> > Cc: Kelley, Sean V <sean.v.kelley@xxxxxxxxx>; Kuppuswamy, > Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxx>; Linux PCI > <linux-pci@xxxxxxxxxxxxxxx>; bhelgaas@xxxxxxxxxx > Subject: Re: [External] : Re: pci_do_recovery not handling fata errors > > On Tue, Mar 16, 2021 at 09:13:56PM +0000, James Puthukattukaran wrote: > > Keith - > > I understand that the RP did not detect the error and so nothing to > > clear in its AER register. My question is - where is the fatal error > > register cleared in the device's (the device that was the cause of the > > fata error) AER register? It does not seem to be done in > > pci_do_recovery walking the hierarchy (unless I'm missing it).... > > Gotcha. > > All pci drivers that implement error handling should be calling > pci_restore_state() somewhere from its .error_resume() callback, which > invokes pci_aer_clear_status() to clear the device's AER status.