RE: [External] : Re: pci_do_recovery not handling fata errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Keith -
I understand that the RP did not detect the error and so nothing to clear in its AER register. My question is - where is the fatal error register cleared in the device's (the device that was the cause of the fata error) AER register? It does not seem to be done in pci_do_recovery walking the hierarchy (unless I'm missing it)....


> -----Original Message-----
> From: Keith Busch <kbusch@xxxxxxxxxx>
> Sent: Saturday, March 13, 2021 12:12 PM
> To: James Puthukattukaran <james.puthukattukaran@xxxxxxxxxx>
> Cc: Kelley, Sean V <sean.v.kelley@xxxxxxxxx>; Kuppuswamy,
> Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxx>; Linux PCI
> <linux-pci@xxxxxxxxxxxxxxx>; bhelgaas@xxxxxxxxxx
> Subject: [External] : Re: pci_do_recovery not handling fata errors
> 
> On Fri, Mar 12, 2021 at 10:57:18PM +0000, James Puthukattukaran wrote:
> > But the clearing of fatal error in the dpc_process_error is only for DPC
> trigger due to "unmaskable uncorrectable".
> > If the trigger reason is ERR_FATAL, then it does not hit the else clause and
> neither is it cleared in the pci_do_recovery code.
> 
> If the reason is ERR_FATAL, then the port didn't detect the error; it is just the
> first DPC capable downstream port to receive the message from some device
> downstream, so there's nothing to clear in its AER register.
> 
> > From dpc_process_error with more context --
> >
> >        else if (reason == 0 &&  <<<<<<< only for "unmaskable uncorrectable".
> What about for ERR_FATAL?
> >                  dpc_get_aer_uncorrect_severity(pdev, &info) &&
> >                  aer_get_device_error_info(pdev, &info)) {
> >                 aer_print_error(pdev, &info);
> >                 pci_aer_clear_nonfatal_status(pdev);
> >                 pci_aer_clear_fatal_status(pdev);
> >         }
> >
> >
> > > -----Original Message-----
> > > From: Kelley, Sean V <sean.v.kelley@xxxxxxxxx>
> > > Sent: Friday, March 12, 2021 5:25 PM
> > > To: James Puthukattukaran <james.puthukattukaran@xxxxxxxxxx>;
> > > Kuppuswamy, Sathyanarayanan
> > > <sathyanarayanan.kuppuswamy@xxxxxxxxx>
> > > Cc: Linux PCI <linux-pci@xxxxxxxxxxxxxxx>; bhelgaas@xxxxxxxxxx
> > > Subject: [External] : Re: pci_do_recovery not handling fata errors
> > >
> > >
> > >
> > > > On Mar 12, 2021, at 12:56 PM, James Puthukattukaran
> > > <james.puthukattukaran@xxxxxxxxxx> wrote:
> > > >
> > > > Hi -
> > > > I’m trying to understand why pci_do_recovery() only clears
> > > > non-fatal but
> > > not fata errors? My immediate concern is call from dpc_handler. If a
> > > device sends an ERR_FATAL to the root port, I would think that as
> > > part of recovery the fatal status in the AER registers of the endpoint
> device would be cleared?
> > > >
> > >
> > >
> > > Adding Sathya who mentioned to me that:
> > >
> > > Fatal error are cleared in
> > >
> > > void dpc_process_error(struct pci_dev *pdev)
> > >
> > > 253                  dpc_get_aer_uncorrect_severity(pdev, &info) &&
> > > 254                  aer_get_device_error_info(pdev, &info)) {
> > > 255                 aer_print_error(pdev, &info);
> > > 256                 pci_aer_clear_nonfatal_status(pdev);
> > > 257                 pci_aer_clear_fatal_status(pdev);
> > >
> > > Thanks,
> > >
> > > Sean
> > >
> > > > Snippet of concern in pci_do_recovery –
> > > >
> > > >         /*
> > > >          * If we have native control of AER, clear error status in the Root
> > > >          * Port or Downstream Port that signaled the error.  If the
> > > >          * platform retained control of AER, it is responsible for clearing
> > > >          * this status.  In that case, the signaling device may not even be
> > > >          * visible to the OS.
> > > >          */
> > > >         if (host->native_aer || pcie_ports_native) {
> > > >                 pcie_clear_device_status(bridge);
> > > >                 pci_aer_clear_nonfatal_status(bridge);   <<<< Just clearing
> > > nonfatal. What about fatal?
> > > >         }
> > > >
> > > > Thanks
> > > > James
> >




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux