On 6/20/24 06:31, Jonathan Cameron wrote: > On Mon, 17 Jun 2024 15:04:04 -0500 > Terry Bowman <terry.bowman@xxxxxxx> wrote: > >> The AER service driver clears the AER correctable error (CE) status before >> calling the correctable error handler. This results in the error's status >> not correctly reflected if read from the CE handler. >> >> The AER CE status is needed by the portdrv's CE handler. The portdrv's >> CE handler is intended to only call the registered notifier callbacks >> if the CE error status has correctable internal error (CIE) set. >> >> This is not a problem for AER uncorrrectbale errors (UCE). The UCE status > > uncorrectable > Thank you. >> is still present in the AER capability and available for reading, if >> needed, when the UCE handler is called. > > I'm seeing the clear in the DPC path for UCE. For other cases is > it a side effect of the reset? > Depends on when its being read. I'm assuming this is after recovery in your case. And after recovery it will be zeroed. Regards, Terry >> >> Change the order of clearing the CE status and calling the CE handler. >> Make it to call the CE handler first and then clear the CE status >> after returning. >> >> Signed-off-by: Terry Bowman <terry.bowman@xxxxxxx> >> Cc: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> >> Cc: linux-pci@xxxxxxxxxxxxxxx > Seems reasonable, but many gremlins around the ordering in these > flows, so I'm to particularly confident. With that in mind. > Reviewed-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxx> > >> --- >> drivers/pci/pcie/aer.c | 7 ++++--- >> 1 file changed, 4 insertions(+), 3 deletions(-) >> >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c >> index ac6293c24976..4dc03cb9aff0 100644 >> --- a/drivers/pci/pcie/aer.c >> +++ b/drivers/pci/pcie/aer.c >> @@ -1094,9 +1094,6 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info) >> * Correctable error does not need software intervention. >> * No need to go through error recovery process. >> */ >> - if (aer) >> - pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS, >> - info->status); >> if (pcie_aer_is_native(dev)) { >> struct pci_driver *pdrv = dev->driver; >> >> @@ -1105,6 +1102,10 @@ static void pci_aer_handle_error(struct pci_dev *dev, struct aer_err_info *info) >> pdrv->err_handler->cor_error_detected(dev); >> pcie_clear_device_status(dev); >> } >> + if (aer) >> + pci_write_config_dword(dev, aer + PCI_ERR_COR_STATUS, >> + info->status); >> + >> } else if (info->severity == AER_NONFATAL) >> pcie_do_recovery(dev, pci_channel_io_normal, aer_root_reset); >> else if (info->severity == AER_FATAL) >