Re: [PATCH] arm64/acpi: Add fixup for HPE m400 quirks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi James,

Just for background, this is a well known bug in the m400's AEPI/HEST
firmware.  There are a number of fixes out there the different distros
have.  I just put together this patch to unify things and have a
common 'upstream' fix.

On 06/15/2018 04:14 AM, James Morse wrote:
> On 13/06/18 19:22, Geoff Levand wrote:
>> Adds a new ACPI init routine acpi_fixup_m400_quirks that adds
>> a work-around for HPE ProLiant m400 APEI firmware problems.
>>
>> The work-around disables APEI when CONFIG_ACPI_APEI is set and
>> m400 firmware is detected.  Without this fixup m400 systems
>> experience errors like these on startup:
>>
>>   [Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 2
>>   [Hardware Error]: event severity: fatal
>>   [Hardware Error]:  Error 0, type: fatal
>>   [Hardware Error]:   section_type: memory error
>>   [Hardware Error]:   error_status: 0x0000000000001300
> 
> "Access to a memory address which is not mapped to any component"
> 
> 
>>   [Hardware Error]:   error_type: 10, invalid address
>>   Kernel panic - not syncing: Fatal hardware error!
> 
> Why is this a problem?
> 
> Surely this is a valid description of an error.

The firmware bug causes this failure, not bad hardware.

> (okay its not particularly useful without the physical address, but the address
> is optional in that structure)
> 
> When does this happen during boot? This looks like a driver mapping some
> non-existent physical address space to see if its device is present...
> unsurprisingly this doesn't go well.
> (might also be a typo in the DSDT)
> 
> Can't we pin down the driver that does this and fix it. Its either wrong for
> everyone, or still broken after you disable APEI.
> 
> 
>> It seems unlikely there will be any m400 firmware updates to fix
>> this problem.
> 
> What is the problem? This patch looks like it shoots the messenger for bringing
> bad news.
 
The news is incorrect, so this patch disables the source (APEI code).

>> diff --git a/arch/arm64/kernel/acpi.c b/arch/arm64/kernel/acpi.c
>> index 7b09487ff8fb..3c315c2c7476 100644
>> --- a/arch/arm64/kernel/acpi.c
>> +++ b/arch/arm64/kernel/acpi.c
>> @@ -31,6 +31,8 @@
>>  #include <asm/cpu_ops.h>
>>  #include <asm/smp_plat.h>
>>  
>> +#include <acpi/apei.h>
>> +
>>  #ifdef CONFIG_ACPI_APEI
>>  # include <linux/efi.h>
>>  # include <asm/pgtable.h>
>> @@ -177,6 +179,33 @@ static int __init acpi_fadt_sanity_check(void)
>>  	return ret;
>>  }
>>  
>> +/*
>> + * acpi_fixup_m400_quirks - Work-around for HPE ProLiant m400 APEI firmware
>> + * problems.
>> + */
>> +static void __init acpi_fixup_m400_quirks(void)
>> +{
>> +	acpi_status status;
>> +	struct acpi_table_header *header;
>> +#if !defined(CONFIG_ACPI_APEI)
>> +	int hest_disable = HEST_DISABLED;
>> +#endif
> 
> Yuck.

Yes, unfortunately, the hest code conditionally defines hest_disable.

>> +
>> +	if (!IS_ENABLED(CONFIG_ACPI_APEI) || hest_disable != HEST_ENABLED)
>> +		return;
>> +
>> +	status = acpi_get_table(ACPI_SIG_HEST, 0, &header);
>> +
>> +	if (ACPI_SUCCESS(status) && !strncmp(header->oem_id, "HPE   ", 6) &&
>> +		!strncmp(header->oem_table_id, "ProLiant", 8) &&
> 
> You should match the affected range of OEM table revisions too, that way a
> firmware upgrade should start working, instead of being permanently disabled
> because we think its unlikely.

The m400 has reached end of life. No one really expects to see any firmware
update.  I don't know what the effected OEM table revisions are, and I don't
think there is an active platform maintainer who could give that info either.

If someone can provide the info. I'll update the fix.

>> +		MIDR_IMPLEMENTOR(read_cpuid_id()) == ARM_CPU_IMP_APM) {
> 
> How is the CPU implementer relevant?

That was just a copy of what other fixes had.  Should I remove it?

> You suggest a firmware-update would make this issue go away...
> 
> 
>> +		hest_disable = HEST_DISABLED;
>> +		pr_info("Disabled APEI for m400.\n");
>> +	}
>> +
>> +	acpi_put_table(header);
>> +}
>> +
>>  /*
>>   * acpi_boot_table_init() called from setup_arch(), always.
>>   *	1. find RSDP and get its address, and then find XSDT
> 
> Nothing arch-specific here. You're adding this to arch/arm64 because
> drivers/acpi/apei doesn't have an existing quirks table?

There was a fix submitted that had it in drivers/acpi/scan.c, but the
ACPI maintainer said he didn't want the fix in the main ACPI code.
See:

  https://lkml.org/lkml/2018/4/19/1020 (ACPI / scan: Fix regression related to X-Gene UARTs)

The m400 is an arm64 platform, so it seems most appropriate to
have it in arch/arm64/kernel/acpi.c.  I followed what was done
for x86 quirks (into arch/x86/kernel/acpi/boot.c), and what was
suggested here: 

  https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=900581 (linux: Enable Buster kernel features for newer ARM64 servers)

Thanks for the review.

-Geoff
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux IBM ACPI]     [Linux Power Management]     [Linux Kernel]     [Linux Laptop]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Video 4 Linux]     [Device Mapper]     [Linux Resources]

  Powered by Linux