On Fri, Mar 12, 2021 at 10:57:18PM +0000, James Puthukattukaran wrote: > But the clearing of fatal error in the dpc_process_error is only for DPC trigger due to "unmaskable uncorrectable". > If the trigger reason is ERR_FATAL, then it does not hit the else clause and neither is it cleared in the pci_do_recovery code. If the reason is ERR_FATAL, then the port didn't detect the error; it is just the first DPC capable downstream port to receive the message from some device downstream, so there's nothing to clear in its AER register. > From dpc_process_error with more context -- > > else if (reason == 0 && <<<<<<< only for "unmaskable uncorrectable". What about for ERR_FATAL? > dpc_get_aer_uncorrect_severity(pdev, &info) && > aer_get_device_error_info(pdev, &info)) { > aer_print_error(pdev, &info); > pci_aer_clear_nonfatal_status(pdev); > pci_aer_clear_fatal_status(pdev); > } > > > > -----Original Message----- > > From: Kelley, Sean V <sean.v.kelley@xxxxxxxxx> > > Sent: Friday, March 12, 2021 5:25 PM > > To: James Puthukattukaran <james.puthukattukaran@xxxxxxxxxx>; > > Kuppuswamy, Sathyanarayanan > > <sathyanarayanan.kuppuswamy@xxxxxxxxx> > > Cc: Linux PCI <linux-pci@xxxxxxxxxxxxxxx>; bhelgaas@xxxxxxxxxx > > Subject: [External] : Re: pci_do_recovery not handling fata errors > > > > > > > > > On Mar 12, 2021, at 12:56 PM, James Puthukattukaran > > <james.puthukattukaran@xxxxxxxxxx> wrote: > > > > > > Hi - > > > I’m trying to understand why pci_do_recovery() only clears non-fatal but > > not fata errors? My immediate concern is call from dpc_handler. If a device > > sends an ERR_FATAL to the root port, I would think that as part of recovery > > the fatal status in the AER registers of the endpoint device would be cleared? > > > > > > > > > Adding Sathya who mentioned to me that: > > > > Fatal error are cleared in > > > > void dpc_process_error(struct pci_dev *pdev) > > > > 253 dpc_get_aer_uncorrect_severity(pdev, &info) && > > 254 aer_get_device_error_info(pdev, &info)) { > > 255 aer_print_error(pdev, &info); > > 256 pci_aer_clear_nonfatal_status(pdev); > > 257 pci_aer_clear_fatal_status(pdev); > > > > Thanks, > > > > Sean > > > > > Snippet of concern in pci_do_recovery – > > > > > > /* > > > * If we have native control of AER, clear error status in the Root > > > * Port or Downstream Port that signaled the error. If the > > > * platform retained control of AER, it is responsible for clearing > > > * this status. In that case, the signaling device may not even be > > > * visible to the OS. > > > */ > > > if (host->native_aer || pcie_ports_native) { > > > pcie_clear_device_status(bridge); > > > pci_aer_clear_nonfatal_status(bridge); <<<< Just clearing > > nonfatal. What about fatal? > > > } > > > > > > Thanks > > > James >