Re: pci_do_recovery not handling fata errors

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Mar 12, 2021 at 10:57:18PM +0000, James Puthukattukaran wrote:
> But the clearing of fatal error in the dpc_process_error is only for DPC trigger due to "unmaskable uncorrectable". 
> If the trigger reason is ERR_FATAL, then it does not hit the else clause and neither is it cleared in the pci_do_recovery code.

If the reason is ERR_FATAL, then the port didn't detect the error; it is
just the first DPC capable downstream port to receive the message from
some device downstream, so there's nothing to clear in its AER register.
 
> From dpc_process_error with more context -- 
> 
>        else if (reason == 0 &&  <<<<<<< only for "unmaskable uncorrectable". What about for ERR_FATAL?
>                  dpc_get_aer_uncorrect_severity(pdev, &info) &&
>                  aer_get_device_error_info(pdev, &info)) {
>                 aer_print_error(pdev, &info);
>                 pci_aer_clear_nonfatal_status(pdev);
>                 pci_aer_clear_fatal_status(pdev);
>         }
>  
> 
> > -----Original Message-----
> > From: Kelley, Sean V <sean.v.kelley@xxxxxxxxx>
> > Sent: Friday, March 12, 2021 5:25 PM
> > To: James Puthukattukaran <james.puthukattukaran@xxxxxxxxxx>;
> > Kuppuswamy, Sathyanarayanan
> > <sathyanarayanan.kuppuswamy@xxxxxxxxx>
> > Cc: Linux PCI <linux-pci@xxxxxxxxxxxxxxx>; bhelgaas@xxxxxxxxxx
> > Subject: [External] : Re: pci_do_recovery not handling fata errors
> > 
> > 
> > 
> > > On Mar 12, 2021, at 12:56 PM, James Puthukattukaran
> > <james.puthukattukaran@xxxxxxxxxx> wrote:
> > >
> > > Hi -
> > > I’m trying to understand why pci_do_recovery() only clears non-fatal but
> > not fata errors? My immediate concern is call from dpc_handler. If a device
> > sends an ERR_FATAL to the root port, I would think that as part of recovery
> > the fatal status in the AER registers of the endpoint device would be cleared?
> > >
> > 
> > 
> > Adding Sathya who mentioned to me that:
> > 
> > Fatal error are cleared in
> > 
> > void dpc_process_error(struct pci_dev *pdev)
> > 
> > 253                  dpc_get_aer_uncorrect_severity(pdev, &info) &&
> > 254                  aer_get_device_error_info(pdev, &info)) {
> > 255                 aer_print_error(pdev, &info);
> > 256                 pci_aer_clear_nonfatal_status(pdev);
> > 257                 pci_aer_clear_fatal_status(pdev);
> > 
> > Thanks,
> > 
> > Sean
> > 
> > > Snippet of concern in pci_do_recovery –
> > >
> > >         /*
> > >          * If we have native control of AER, clear error status in the Root
> > >          * Port or Downstream Port that signaled the error.  If the
> > >          * platform retained control of AER, it is responsible for clearing
> > >          * this status.  In that case, the signaling device may not even be
> > >          * visible to the OS.
> > >          */
> > >         if (host->native_aer || pcie_ports_native) {
> > >                 pcie_clear_device_status(bridge);
> > >                 pci_aer_clear_nonfatal_status(bridge);   <<<< Just clearing
> > nonfatal. What about fatal?
> > >         }
> > >
> > > Thanks
> > > James
> 



[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux