> On Sep 10, 2018, at 2:56 PM, Guenter Roeck <linux@xxxxxxxxxxxx> wrote: > > Hi folks, > > even after commit eeb89e2bb1ac ("x86/efi: Load fixmap GDT in > efi_call_phys_epilog()"), my i386/efi qemu boot tests still crash randomly > (roughly 5-10% of the time). As before, I don't see much useful output in > the qemu log (this time it doesn't even complain about a triple fault). > > Debugging shows that the crash happens in efi_call_phys_epilog(). > A sample log from a crashed test run is attached below. It appears that > the crash happens if there is an interrupt at a critical section of the > code. > > While playing with the code, I found a possible fix. > > diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c > index 05ca14222463..9959657127f4 100644 > --- a/arch/x86/platform/efi/efi_32.c > +++ b/arch/x86/platform/efi/efi_32.c > @@ -85,10 +85,9 @@ pgd_t * __init efi_call_phys_prolog(void) > > void __init efi_call_phys_epilog(pgd_t *save_pgd) > { > + load_fixmap_gdt(0); > load_cr3(save_pgd); > __flush_tlb_all(); > - > - load_fixmap_gdt(0); > } We have IRQs on here? It seems plausible that we’re in a window where the EFI pgd doesn’t have cpu_entry_area mapped. Also, the hard coded CPU 0 is suspicious. Maybe try instrumenting the code to check whether the clone_pgd_range calls in setup_percpu.c have happened yet? Your patch may well be correct, but, if we have IRQs on, we should really have cpu_entry_area mapped in both pgds. Or we could turn off IRQs. Why on Earth are IRQs on in a context where the fixmap gdt is unusable?