pci_aer_clear_device_status() currently resets the device status (PCI_EXP_DEVSTA) on the downstream port above a device, or the port itself if the port is the reported AER error source. This happens even when error handling is firmware first. Our interpretation is that firmware first handling means that the firmware will deal with clearing all relevant error reporting registers including this one. Bjorn Helgaas reports that this has been clarified in sec 4.5.1 of: System Firmware Intermediary (SFI) _OSC and DPC Updates ECN, Feb 24, 2020, affecting PCI Firmware Specification, Rev. 3.2 https://members.pcisig.com/wg/PCI-SIG/document/14076 The call path that triggers this unwanted clear is: ghes_do_proc-> ghes_handle_aer-> aer_recover_queue-> aer_recover_work_func-> pcie_do_recovery-> pci_aer_clear_device_status I believe this extra status clear is probably harmless so probably not worth backporting. I'm not aware of any reports of issues caused by this and only identified it as incorrect during some emulated reset flow testing. Signed-off-by: Jonathan Cameron <Jonathan.Cameron@xxxxxxxxxx> --- Changes since v1: * As this is independent of the RCiEP APEI error handling patch I have separated them. * Rebase on mainline including changing to new handling of firmware first vs native handling. * More detail added to patch description including the reference Bjorn suggested. drivers/pci/pcie/aer.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c index 3acf56683915..c7cdeaff4350 100644 --- a/drivers/pci/pcie/aer.c +++ b/drivers/pci/pcie/aer.c @@ -245,6 +245,9 @@ void pci_aer_clear_device_status(struct pci_dev *dev) { u16 sta; + if (!pcie_aer_is_native(dev)) + return; + pcie_capability_read_word(dev, PCI_EXP_DEVSTA, &sta); pcie_capability_write_word(dev, PCI_EXP_DEVSTA, sta); } -- 2.19.1