This started as a nudge from Keith, who pointed out that it doesn't make sense to disable AER services when only one device has a FIRMWARE_FIRST HEST. I won't re-phrase the points in the original patch [1]. The patch started a long discussion in the ACPI Software Working Group (ASWG). The nearly unanimous conclusion is that my original interpretation is correct. I'd like to quote one of the tables that was produced as part of that conversation: (_OSC AER Control, HEST AER Structure FFS) = (0, 0) * OSPM is prevented from writing to the PCI Express AER registers. * OSPM has no guidance on how AER errors are being handled – but it does know that it is not in control of AER registers. PCI-e errors that make it to the OS (via NMI, etc) would be treated as spurious since access to the AER registers isn’t allowed for proper sourcing. (_OSC AER Control, HEST AER Structure FFS) = (0, 1) * OSPM is prevented from writing to the PCI Express AER registers. * OSPM is being given guidance that Firmware is handling AER errors and those interrupts are routed to the platform. Firmware may pass along error information via GHES (_OSC AER Control, HEST AER Structure FFS) = (0, Does not exist) * OSPM is prevented from writing to the PCI Express AER registers. * OSPM has no guidance on how AER errors are being handled – but it does know that it is not in control of AER registers. PCI-e errors that make it to the OS (via NMI, etc) would be treated as spurious since access to the AER registers isn’t allowed for proper sourcing. (_OSC AER Control, HEST AER Structure FFS) = (1, 0) * OSPM is in control of writing to the PCI Express AER registers. * OSPM is being given guidance that AER errors will interrupt the OS directly and that the OS is expected to handle all AER capability structure read/clears for the devices with this attribute (or all if the Global Bit is set.) (_OSC AER Control, HEST AER Structure FFS) = (1, 1) * OSPM is in control of writing to the PCI Express AER registers. * OSPM is being given guidance that although OS is in control of AER read/writes – the actual interrupt is being routed to the platform first. * Subsequent fields with masks/enables should be performed by the OS during initialization on behalf of firmware. These are to be honoured in this mode because with FF, the firmware needs to be able to handle the errors it expects and not be given errors it was not expecting to handle. * Firmware may pass along error information via GHES, or generate an OS interrupt and allow the OS to interrogate AER status directly via the AER capability structures. (_OSC AER Control, HEST AER Structure FFS) = (0, Does not exist) * OSPM is in control of writing to the PCI Express AER registers. * OSPM has no guidance from the platform and is in complete control of AER error handling. There may be one caveat. Someone mentioned in the original discussions that there may exist machines which make the assumption that HEST is authoritative, but did not identify any such machine. We should keep in mind that they may require a quirk. Alex [1] https://lkml.org/lkml/2018/11/16/202 Changes since v1: * Started 6-month conversation in ASWG * Re-phrased commit message to reflect some of the points in ASWG discussion Alexandru Gagniuc (2): PCI/AER: Do not use APEI/HEST to disable AER services globally PCI/AER: Determine AER ownership based on _OSC instead of HEST drivers/acpi/pci_root.c | 9 +---- drivers/pci/pcie/aer.c | 82 ++-------------------------------------- include/linux/pci-acpi.h | 6 --- 3 files changed, 5 insertions(+), 92 deletions(-) -- 2.19.2