RE: pci_do_recovery not handling fata errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



But the clearing of fatal error in the dpc_process_error is only for DPC trigger due to "unmaskable uncorrectable". 
If the trigger reason is ERR_FATAL, then it does not hit the else clause and neither is it cleared in the pci_do_recovery code.

>From dpc_process_error with more context -- 

       else if (reason == 0 &&  <<<<<<< only for "unmaskable uncorrectable". What about for ERR_FATAL?
                 dpc_get_aer_uncorrect_severity(pdev, &info) &&
                 aer_get_device_error_info(pdev, &info)) {
                aer_print_error(pdev, &info);
                pci_aer_clear_nonfatal_status(pdev);
                pci_aer_clear_fatal_status(pdev);
        }
 

> -----Original Message-----
> From: Kelley, Sean V <sean.v.kelley@xxxxxxxxx>
> Sent: Friday, March 12, 2021 5:25 PM
> To: James Puthukattukaran <james.puthukattukaran@xxxxxxxxxx>;
> Kuppuswamy, Sathyanarayanan
> <sathyanarayanan.kuppuswamy@xxxxxxxxx>
> Cc: Linux PCI <linux-pci@xxxxxxxxxxxxxxx>; bhelgaas@xxxxxxxxxx
> Subject: [External] : Re: pci_do_recovery not handling fata errors
> 
> 
> 
> > On Mar 12, 2021, at 12:56 PM, James Puthukattukaran
> <james.puthukattukaran@xxxxxxxxxx> wrote:
> >
> > Hi -
> > I’m trying to understand why pci_do_recovery() only clears non-fatal but
> not fata errors? My immediate concern is call from dpc_handler. If a device
> sends an ERR_FATAL to the root port, I would think that as part of recovery
> the fatal status in the AER registers of the endpoint device would be cleared?
> >
> 
> 
> Adding Sathya who mentioned to me that:
> 
> Fatal error are cleared in
> 
> void dpc_process_error(struct pci_dev *pdev)
> 
> 253                  dpc_get_aer_uncorrect_severity(pdev, &info) &&
> 254                  aer_get_device_error_info(pdev, &info)) {
> 255                 aer_print_error(pdev, &info);
> 256                 pci_aer_clear_nonfatal_status(pdev);
> 257                 pci_aer_clear_fatal_status(pdev);
> 
> Thanks,
> 
> Sean
> 
> > Snippet of concern in pci_do_recovery –
> >
> >         /*
> >          * If we have native control of AER, clear error status in the Root
> >          * Port or Downstream Port that signaled the error.  If the
> >          * platform retained control of AER, it is responsible for clearing
> >          * this status.  In that case, the signaling device may not even be
> >          * visible to the OS.
> >          */
> >         if (host->native_aer || pcie_ports_native) {
> >                 pcie_clear_device_status(bridge);
> >                 pci_aer_clear_nonfatal_status(bridge);   <<<< Just clearing
> nonfatal. What about fatal?
> >         }
> >
> > Thanks
> > James





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux