Hi Mark, On 26/06/18 21:20, Mark Salter wrote: > On Tue, 2018-06-26 at 15:51 +0100, James Morse wrote: >> On 25/06/18 16:34, Mark Salter wrote: >>> On Fri, 2018-06-22 at 11:19 -0400, Mark Salter wrote: >>>> I'm going to hack something to get to the ghes info earlier in boot and >>>> check the things you mention above wrt Error Status Block and GHES.0. >>> >>> So I had to end up instrumenting the EFI stub to see where the error came >>> from. At the start of the stub, there is no GHES.2 error. The error first >>> shows up after the stub's call to ExitBootServices returns. >> >> What's the notification type of GHES.2? I'm guessing POLLed or some kind of IRQ. >> These systems don't have EL3, so the CPU must continue running while something >> external generates the CPER records. The records being visible is the last point >> the faulty-access could have been made, with the window of time depending on how >> fast this external-thing receives and processes the error. > > There's a System Control Processor (slimpro) on the SoC which can interact with > the CPU in various ways and which has access to memory and other hw. Thanks, saves me guessing! >>> So it looks >>> like the firmware itself is causing the error. There's still a chance that >>> the stub is doing something wrong with the memory map passed to the >>> firmware, so I'll try to eliminate that as well. >> >> adding delay loops will help prove the EFIStub is innocent. > > Didn't change anything. Okay, so just to clarify, a delay before ExitBootServices doesn't cause the error to show up before ExitBootServices, so the error hasn't occurred prior to this point. And a delay after ExitBootServices allows us to see the error before we exit into head.S. (this rules out a bug in head.S) The delays should be long enough to tell us this slimpro isn't generating the error records N seconds after reset. Given this I agree we should disable_hest based on the DMI platform name and the UEFI version number. (it may be earlier firmware didn't have this bug). I don't have anything to test this on, so I've picked the DMI strings out the demsg output on that bugzilla entry. Any chance you could give it a test? >> Are redhat able to rebuild UEFI on these systems? (Can it be fixed?) >> https://bugzilla.redhat.com/show_bug.cgi?id=1285107 is about the m400 >> description of the GIC, comments 15 and 16 show a UEFI patch to something other >> than the upstream platforms tree[0], and new firmware being tested. >> (although this may be wishful thinking) > > HPe would respond to bug reports until m400 reached EOL. They have been pretty > clear that no more firmware updates will be done. Thanks, it was a bit murky from that ticket... Thanks for doing this! James -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html