Hi Mark, Thanks for shed-ing some light on what is going on here! On 25/06/18 16:34, Mark Salter wrote: > On Fri, 2018-06-22 at 11:19 -0400, Mark Salter wrote: >> I'm going to hack something to get to the ghes info earlier in boot and >> check the things you mention above wrt Error Status Block and GHES.0. > > So I had to end up instrumenting the EFI stub to see where the error came > from. At the start of the stub, there is no GHES.2 error. The error first > shows up after the stub's call to ExitBootServices returns. What's the notification type of GHES.2? I'm guessing POLLed or some kind of IRQ. These systems don't have EL3, so the CPU must continue running while something external generates the CPER records. The records being visible is the last point the faulty-access could have been made, with the window of time depending on how fast this external-thing receives and processes the error. > So it looks > like the firmware itself is causing the error. There's still a chance that > the stub is doing something wrong with the memory map passed to the > firmware, so I'll try to eliminate that as well. adding delay loops will help prove the EFIStub is innocent. Are there any optional drivers being loaded by UEFI? (can you remove any USB mass storage drives for instance). Are redhat able to rebuild UEFI on these systems? (Can it be fixed?) https://bugzilla.redhat.com/show_bug.cgi?id=1285107 is about the m400 description of the GIC, comments 15 and 16 show a UEFI patch to something other than the upstream platforms tree[0], and new firmware being tested. (although this may be wishful thinking) It looks like quirking this based on the DMI platform name and UEFI version will be what we need. We could discard anything in the error status block areas at ghes_probe() time based on this quirk, but we may have missed other problems during boot, giving a false sense of security. Thanks, James [0] Might be wrong, but this is where I look: https://github.com/tianocore/edk2-platforms.git -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html