On Mon, 17 Jun 2024 15:04:04 -0500 Terry Bowman <terry.bowman@xxxxxxx> wrote: > The AER service driver clears the AER correctable error (CE) status before > calling the correctable error handler. This results in the error's status > not correctly reflected if read from the CE handler. > > The AER CE status is needed by the portdrv's CE handler. The portdrv's > CE handler is intended to only call the registered notifier callbacks > if the CE error status has correctable internal error (CIE) set. > > This is not a problem for AER uncorrrectbale errors (UCE). The UCE status uncorrectable > is still present in the AER capability and available for reading, if > needed, when the UCE handler is called. I'm seeing the clear in the DPC path for UCE. For other cases is it a side effect of the reset? > > Change the order of clearing the CE status and calling the CE handler. > Make it to call the CE handler first and then clear the CE status > after returning. > > Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx> > Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> > Cc: linux-pci@xxxxxxxxxxxxxxx Seems reasonable, but many gremlins around the ordering in these flows, so I'm to particularly confident. With that in mind. Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxx> > --- > drivers/pci/pcie/aer.c | 7 ++++--- > 1 file changed, 4 insertions(+), 3 deletions(-) > > diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c > index ac6293c24976..4dc03cb9aff0 100644 > --- a/drivers/pci/pcie/aer.c > +++ b/drivers/pci/pcie/aer.c > @@ -1094,9 +1094,6 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info) > * Correctable error does not need software intervention. > * No need to go through error recovery process. > */ > - if (aer) > - pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS, > - info->status); > if (pcie_aer_is_native(dev)) { > struct pci_driver *pdrv = dev->driver; > > @@ -1105,6 +1102,10 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info) > pdrv->err_handler->cor_error_detected(dev); > pcie_clear_device_status(dev); > } > + if (aer) > + pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS, > + info->status); > + > } else if (info->severity == AER_NONFATAL) > pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset); > else if (info->severity == AER_FATAL)