Re: [PATCH] arm64/acpi: Add fixup for HPE m400 quirks

Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> · Wed, 27 Jun 2018 10:48:58 +0200

On 26 June 2018 at 22:20, Mark Salter <msalter@xxxxxxxxxx> wrote:
> On Tue, 2018-06-26 at 15:51 +0100, James Morse wrote:
>> Hi Mark,
>>
>> Thanks for shed-ing some light on what is going on here!
>>
>> On 25/06/18 16:34, Mark Salter wrote:
>> > On Fri, 2018-06-22 at 11:19 -0400, Mark Salter wrote:
>> > > I'm going to hack something to get to the ghes info earlier in boot and
>> > > check the things you mention above wrt Error Status Block and GHES.0.
>> >
>> > So I had to end up instrumenting the EFI stub to see where the error came
>> > from. At the start of the stub, there is no GHES.2 error. The error first
>> > shows up after the stub's call to ExitBootServices returns.
>>
>> What's the notification type of GHES.2? I'm guessing POLLed or some kind of IRQ.
>
> SCI
>
> Here's the HEST entry:
>
> [028h 0040   2]                Subtable Type : 0009 [Generic Hardware Error Source]
> [02Ah 0042   2]                    Source Id : 0002
> [02Ch 0044   2]            Related Source Id : FFFF
> [02Eh 0046   1]                     Reserved : 00
> [02Fh 0047   1]                      Enabled : 01
> [030h 0048   4]       Records To Preallocate : 00000001
> [034h 0052   4]      Max Sections Per Record : 00000001
> [038h 0056   4]          Max Raw Data Length : 00000AEC
>
> [03Ch 0060  12]         Error Status Address : [Generic Address Structure]
> [03Ch 0060   1]                     Space ID : 00 [SystemMemory]
> [03Dh 0061   1]                    Bit Width : 40
> [03Eh 0062   1]                   Bit Offset : 00
> [03Fh 0063   1]         Encoded Access Width : 04 [QWord Access:64]
> [040h 0064   8]                      Address : 0000004FF7E9F0E0
>

This is a reserved region in the memory map. Does that apply to the
other occurrences as well?

> There are 9 others all identical except for Source ID and address.
>
>> These systems don't have EL3, so the CPU must continue running while something
>> external generates the CPER records. The records being visible is the last point
>> the faulty-access could have been made, with the window of time depending on how
>> fast this external-thing receives and processes the error.
>
> There's a System Control Processor (slimpro) on the SoC which can interact with
> the CPU in various ways and which has access to memory and other hw.
>
>>
>>
>> > So it looks
>> > like the firmware itself is causing the error. There's still a chance that
>> > the stub is doing something wrong with the memory map passed to the
>> > firmware, so I'll try to eliminate that as well.
>>
>> adding delay loops will help prove the EFIStub is innocent.
>
> Didn't change anything.
>
>>
>> Are there any optional drivers being loaded by UEFI? (can you remove any USB
>> mass storage drives for instance).
>
> The only storage is pci based. There is a USB port but doesn't look like
> anything is attached to it. I don't have physical access to it. It is one on
> many moonshot cartridges in a chassis several hundred miles away.
>
>>
>> Are redhat able to rebuild UEFI on these systems? (Can it be fixed?)
>
> No.
>
>>
>> https://bugzilla.redhat.com/show_bug.cgi?id=1285107 is about the m400
>> description of the GIC, comments 15 and 16 show a UEFI patch to something other
>> than the upstream platforms tree[0], and new firmware being tested.
>> (although this may be wishful thinking)
>
> HPe would respond to bug reports until m400 reached EOL. They have been pretty
> clear that no more firmware updates will be done.
>
>>
>> It looks like quirking this based on the DMI platform name and UEFI version will
>> be what we need. We could discard anything in the error status block areas at
>> ghes_probe() time based on this quirk, but we may have missed other problems
>> during boot, giving a false sense of security.
>>
>>
>> Thanks,
>>
>> James
>>
>>
>> [0] Might be wrong, but this is where I look:
>> https://github.com/tianocore/edk2-platforms.git
>
>
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@xxxxxxxxxxxxxxxxxxx
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
--
To unsubscribe from this list: send the line "unsubscribe linux-acpi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html