On 4/28/2020 3:37 PM, Bjorn Helgaas wrote: > [EXTERNAL EMAIL] > > [+to Mario, Austin, Rafael; Dell folks, I suspect this commit will > break Dell servers but I'd like your opinion] > > <snip> Thanks Bjorn, for the heads up. I checked with our server BIOS team and they say that only checking _OSC for AER should work on our servers. We always configure_OSC and the HEST FIRMWARE_FIRST flag to retain firmware control of AER so either could be checked. > I *really* want the patch because the current mix of using both _OSC > and FIRMWARE_FIRST to determine AER capability ownership is a mess and > getting worse, but I'm more and more doubtful. > > My contention is that if firmware doesn't want the OS to use the AER > capability it should simply decline to grant control via _OSC. I agree per spec that _OSC should be used and this was confirmed by the ACPI working group. Alex had submitted a patch for us [2] to switch to using _OSC to determine AER ownership following the decision in the ACPI working group. [2] https://lkml.org/lkml/2018/11/16/202 > > But things like 0584396157ad ("PCI: PCIe AER: honor ACPI HEST FIRMWARE > FIRST mode") [1] suggest that some machines grant AER control to the > OS via _OSC, but still expect the OS to leave AER alone for certain > devices. AFAIK, no Dell server, including the 11G servers mentioned in that patch, have granted control of AER via _OSC and set HEST FIRMWARE_FIRST for some devices. I don't think this model is even support by the ACPI/PCIe standards. Yes, you can set the bits that way, but there is no text I've found that says how the OS/firmware should behave in that scenario. In order to be interoperable, I think someone would need to standardized how the OS/firmware would could co-ordinate in such a model. > > I think the FIRMWARE_FIRST language in the ACPI spec is really too > vague to tell the OS not to use the AER Capability, but it seems like > that's what commits like [1] rely on. > > The current _OSC definition (PCI Firmware r3.2) applies only to > PNP0A03/PNP0A08 devices, but it's conceivable that it could be > extended to other devices if we need per-device AER Capability > ownership. > > [1] https://git.kernel.org/linus/0584396157ad > >> commit 8f8e42e7c2dd ("PCI/AER: Use only _OSC to determine AER ownership") >> Author: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx> >> Date: Mon Apr 27 18:25:13 2020 -0500 >> >> PCI/AER: Use only _OSC to determine AER ownership >> >> Per the PCI Firmware spec, r3.2, sec 4.5.1, the OS can request control of >> AER via bit 3 of the _OSC Control Field. In the returned value of the >> Control Field: >> >> The firmware sets [bit 3] to 1 to grant control over PCI Express Advanced >> Error Reporting. ... after control is transferred to the operating >> system, firmware must not modify the Advanced Error Reporting Capability. >> If control of this feature was requested and denied or was not requested, >> firmware returns this bit set to 0. >> >> Previously the pci_root driver looked at the HEST FIRMWARE_FIRST bit to >> determine whether to request ownership of the AER Capability. This was >> based on ACPI spec v6.3, sec 18.3.2.4, and similar sections, which say >> things like: >> >> Bit [0] - FIRMWARE_FIRST: If set, indicates that system firmware will >> handle errors from this source first. >> >> Bit [1] - GLOBAL: If set, indicates that the settings contained in this >> structure apply globally to all PCI Express Devices. >> >> These ACPI references don't say anything about ownership of the AER >> Capability. >> >> Remove use of the FIRMWARE_FIRST bit and rely only on the _OSC bit to >> determine whether we have control of the AER Capability. >> >> Link: https://lore.kernel.org/r/67af2931705bed9a588b5a39d369cb70b9942190.1587925636.git.sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx >> [bhelgaas: commit log, split patches] >> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@xxxxxxxxxxxxxxx> >> Signed-off-by: Bjorn Helgaas <bhelgaas@xxxxxxxxxx> >> >> diff --git a/drivers/acpi/pci_root.c b/drivers/acpi/pci_root.c >> index ac8ad6cb82aa..9e235c1a75ff 100644 >> --- a/drivers/acpi/pci_root.c >> +++ b/drivers/acpi/pci_root.c >> @@ -483,13 +483,8 @@ static void negotiate_os_control(struct acpi_pci_root *root, int *no_aspm, >> if (IS_ENABLED(CONFIG_HOTPLUG_PCI_SHPC)) >> control |= OSC_PCI_SHPC_NATIVE_HP_CONTROL; >> >> - if (pci_aer_available()) { >> - if (aer_acpi_firmware_first()) >> - dev_info(&device->dev, >> - "PCIe AER handled by firmware\n"); >> - else >> - control |= OSC_PCI_EXPRESS_AER_CONTROL; >> - } >> + if (pci_aer_available()) >> + control |= OSC_PCI_EXPRESS_AER_CONTROL; >> >> /* >> * Per the Downstream Port Containment Related Enhancements ECN to >> diff --git a/drivers/pci/pcie/aer.c b/drivers/pci/pcie/aer.c >> index f4274d301235..efc26773cc6d 100644 >> --- a/drivers/pci/pcie/aer.c >> +++ b/drivers/pci/pcie/aer.c >> @@ -318,30 +318,6 @@ int pcie_aer_get_firmware_first(struct pci_dev *dev) >> aer_set_firmware_first(dev); >> return dev->__aer_firmware_first; >> } >> - >> -static bool aer_firmware_first; >> - >> -/** >> - * aer_acpi_firmware_first - Check if APEI should control AER. >> - */ >> -bool aer_acpi_firmware_first(void) >> -{ >> - static bool parsed = false; >> - struct aer_hest_parse_info info = { >> - .pci_dev = NULL, /* Check all PCIe devices */ >> - .firmware_first = 0, >> - }; >> - >> - if (pcie_ports_native) >> - return false; >> - >> - if (!parsed) { >> - apei_hest_parse(aer_hest_parse, &info); >> - aer_firmware_first = info.firmware_first; >> - parsed = true; >> - } >> - return aer_firmware_first; >> -} >> #endif >> >> #define PCI_EXP_AER_FLAGS (PCI_EXP_DEVCTL_CERE | PCI_EXP_DEVCTL_NFERE | \ >> @@ -1523,7 +1499,7 @@ static struct pcie_port_service_driver aerdriver = { >> */ >> int __init pcie_aer_init(void) >> { >> - if (!pci_aer_available() || aer_acpi_firmware_first()) >> + if (!pci_aer_available()) >> return -ENXIO; >> return pcie_port_service_register(&aerdriver); >> } >> diff --git a/include/linux/pci-acpi.h b/include/linux/pci-acpi.h >> index 2d155bfb8fbf..11c98875538a 100644 >> --- a/include/linux/pci-acpi.h >> +++ b/include/linux/pci-acpi.h >> @@ -125,10 +125,4 @@ static inline void acpi_pci_add_bus(struct pci_bus *bus) { } >> static inline void acpi_pci_remove_bus(struct pci_bus *bus) { } >> #endif /* CONFIG_ACPI */ >> >> -#ifdef CONFIG_ACPI_APEI >> -extern bool aer_acpi_firmware_first(void); >> -#else >> -static inline bool aer_acpi_firmware_first(void) { return false; } >> -#endif >> - >> #endif /* _PCI_ACPI_H_ */