Hi Geoff, On 13/06/18 19:22, Geoff Levand wrote: > Adds a new ACPI init routine acpi_fixup_m400_quirks that adds > a work-around for HPE ProLiant m400 APEI firmware problems. > > The work-around disables APEI when CONFIG_ACPI_APEI is set and > m400 firmware is detected. Without this fixup m400 systems > experience errors like these on startup: > > [Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 2 > [Hardware Error]: event severity: fatal > [Hardware Error]: Error 0, type: fatal > [Hardware Error]: section_type: memory error > [Hardware Error]: error_status: 0x0000000000001300 "Access to a memory address which is not mapped to any component" > [Hardware Error]: error_type: 10, invalid address > Kernel panic - not syncing: Fatal hardware error! Why is this a problem? Surely this is a valid description of an error. (okay its not particularly useful without the physical address, but the address is optional in that structure) When does this happen during boot? This looks like a driver mapping some non-existent physical address space to see if its device is present... unsurprisingly this doesn't go well. (might also be a typo in the DSDT) Can't we pin down the driver that does this and fix it. Its either wrong for everyone, or still broken after you disable APEI. > It seems unlikely there will be any m400 firmware updates to fix > this problem. What is the problem? This patch looks like it shoots the messenger for bringing bad news. > diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c > index 7b09487ff8fb..3c315c2c7476 100644 > --- a/arch/arm64/kernel/acpi.c > +++ b/arch/arm64/kernel/acpi.c > @@ -31,6 +31,8 @@ > #include <asm/cpu_ops.h> > #include <asm/smp_plat.h> > > +#include <acpi/apei.h> > + > #ifdef CONFIG_ACPI_APEI > # include <linux/efi.h> > # include <asm/pgtable.h> > @@ -177,6 +179,33 @@ static int __init acpi_fadt_sanity_check(void) > return ret; > } > > +/* > + * acpi_fixup_m400_quirks - Work-around for HPE ProLiant m400 APEI firmware > + * problems. > + */ > +static void __init acpi_fixup_m400_quirks(void) > +{ > + acpi_status status; > + struct acpi_table_header *header; > +#if !defined(CONFIG_ACPI_APEI) > + int hest_disable = HEST_DISABLED; > +#endif Yuck. > + > + if (!IS_ENABLED(CONFIG_ACPI_APEI) || hest_disable != HEST_ENABLED) > + return; > + > + status = acpi_get_table(ACPI_SIG_HEST, 0, &header); > + > + if (ACPI_SUCCESS(status) && !strncmp(header->oem_id, "HPE ", 6) && > + !strncmp(header->oem_table_id, "ProLiant", 8) && You should match the affected range of OEM table revisions too, that way a firmware upgrade should start working, instead of being permanently disabled because we think its unlikely. > + MIDR_IMPLEMENTOR(read_cpuid_id()) == ARM_CPU_IMP_APM) { How is the CPU implementer relevant? You suggest a firmware-update would make this issue go away... > + hest_disable = HEST_DISABLED; > + pr_info("Disabled APEI for m400.\n"); > + } > + > + acpi_put_table(header); > +} > + > /* > * acpi_boot_table_init() called from setup_arch(), always. > * 1. find RSDP and get its address, and then find XSDT Nothing arch-specific here. You're adding this to arch/arm64 because drivers/acpi/apei doesn't have an existing quirks table? Thanks, James -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html