From: Ashish Kalra <ashish.kalra@xxxxxxx> With SNP guest kexec observe the following efi memmap corruption : [ 0.000000] efi: EFI v2.7 by EDK II [ 0.000000] efi: SMBIOS=0x7e33f000 SMBIOS 3.0=0x7e33d000 ACPI=0x7e57e000 ACPI 2.0=0x7e57e014 MEMATTR=0x7cc3c018 Unaccepted=0x7c09e018 [ 0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries: [ 0.000000] efi: mem03: [type=269370880|attr=0x0e42100e42180e41] range=[0x0486200e41038c18-0x200e898a0eee713ac17] (invalid) [ 0.000000] efi: mem04: [type=12336|attr=0x0e410686300e4105] range=[0x100e420000000176-0x8c290f26248d200e175] (invalid) [ 0.000000] efi: mem06: [type=1124304408|attr=0x000030b400000028] range=[0x0e51300e45280e77-0xb44ed2142f460c1e76] (invalid) [ 0.000000] efi: mem08: [type=68|attr=0x300e540583280e41] range=[0x0000011affff3cd8-0x486200e54b38c0bcd7] (invalid) [ 0.000000] efi: mem10: [type=1107529240|attr=0x0e42280e41300e41] range=[0x300e41058c280e42-0x38010ae54c5c328ee41] (invalid) [ 0.000000] efi: mem11: [type=189335566|attr=0x048d200e42038e18] range=[0x0000318c00000048-0xe42029228ce4200047] (invalid) [ 0.000000] efi: mem12: [type=239142534|attr=0x0000002400000b4b] range=[0x0e41380e0a7d700e-0x80f26238f22bfe500d] (invalid) [ 0.000000] efi: mem14: [type=239207055|attr=0x0e41300e43380e0a] range=[0x8c280e42048d200e-0xc70b028f2f27cc0a00d] (invalid) [ 0.000000] efi: mem15: [type=239210510|attr=0x00080e660b47080e] range=[0x0000324c0000001c-0xa78028634ce490001b] (invalid) [ 0.000000] efi: mem16: [type=4294848528|attr=0x0000329400000014] range=[0x0e410286100e4100-0x80f252036a218f20ff] (invalid) [ 0.000000] efi: mem19: [type=2250772033|attr=0x42180e42200e4328] range=[0x41280e0ab9020683-0xe0e538c28b39e62682] (invalid) [ 0.000000] efi: mem20: [type=16| | | | | | | | | | |WB| |WC| ] range=[0x00000008ffff4438-0xffff44340090333c437] (invalid) [ 0.000000] efi: mem22: [Reserved |attr=0x000000c1ffff4420] range=[0xffff442400003398-0x1033a04240003f397] (invalid) [ 0.000000] efi: mem23: [type=1141080856|attr=0x080e41100e43180e] range=[0x280e66300e4b280e-0x440dc5ee7141f4c080d] (invalid) [ 0.000000] efi: mem25: [Reserved |attr=0x0000000affff44a0] range=[0xffff44a400003428-0x1034304a400013427] (invalid) [ 0.000000] efi: mem28: [type=16| | | | | | | | | | |WB| |WC| ] range=[0x0000000affff4488-0xffff448400b034bc487] (invalid) [ 0.000000] efi: mem30: [Reserved |attr=0x0000000affff4470] range=[0xffff447400003518-0x10352047400013517] (invalid) [ 0.000000] efi: mem33: [type=16| | | | | | | | | | |WB| |WC| ] range=[0x0000000affff4458-0xffff445400b035ac457] (invalid) [ 0.000000] efi: mem35: [type=269372416|attr=0x0e42100e42180e41] range=[0x0486200e44038c18-0x200e8b8a0eee823ac17] (invalid) [ 0.000000] efi: mem37: [type=2351435330|attr=0x0e42100e42180e42] range=[0x470783380e410686-0x2002b2a041c2141e685] (invalid) [ 0.000000] efi: mem38: [type=1093668417|attr=0x100e420000000270] range=[0x42100e42180e4220-0xfff366a4e421b78c21f] (invalid) [ 0.000000] efi: mem39: [type=76357646|attr=0x180e42200e42280e] range=[0x0e410686300e4105-0x4130f251a0710ae5104] (invalid) [ 0.000000] efi: mem40: [type=940444268|attr=0x0e42200e42280e41] range=[0x180e42200e42280e-0x300fc71c300b4f2480d] (invalid) [ 0.000000] efi: mem41: [MMIO |attr=0x8c280e42048d200e] range=[0xffff479400003728-0x42138e0c87820292727] (invalid) [ 0.000000] efi: mem42: [type=1191674680|attr=0x0000004c0000000b] range=[0x300e41380e0a0246-0x470b0f26238f22b8245] (invalid) [ 0.000000] efi: mem43: [type=2010|attr=0x0301f00e4d078338] range=[0x45038e180e42028f-0xe4556bf118f282528e] (invalid) [ 0.000000] efi: mem44: [type=1109921345|attr=0x300e44000000006c] range=[0x44080e42100e4218-0xfff39254e42138ac217] (invalid) ... This EFI memap corruption is happening with efi_arch_mem_reserve() invocation in case of kexec boot. ( efi_arch_mem_reserve() is invoked with the following call-stack: ) [ 0.310010] efi_arch_mem_reserve+0xb1/0x220 [ 0.311382] efi_mem_reserve+0x36/0x60 [ 0.311973] efi_bgrt_init+0x17d/0x1a0 [ 0.313265] acpi_parse_bgrt+0x12/0x20 [ 0.313858] acpi_table_parse+0x77/0xd0 [ 0.314463] acpi_boot_init+0x362/0x630 [ 0.315069] setup_arch+0xa88/0xf80 [ 0.315629] start_kernel+0x68/0xa90 [ 0.316194] x86_64_start_reservations+0x1c/0x30 [ 0.316921] x86_64_start_kernel+0xbf/0x110 [ 0.317582] common_startup_64+0x13e/0x141 efi_arch_mem_reserve() calls efi_memmap_alloc() to allocate memory for EFI memory map and due to early allocation it uses memblock allocation. Later during boot, efi_enter_virtual_mode() calls kexec_enter_virtual_mode() in case of a kexec-ed kernel boot. This function kexec_enter_virtual_mode() installs the new EFI memory map by calling efi_memmap_init_late() which remaps the efi_memmap physically allocated in efi_arch_mem_reserve(), but this remapping is still using memblock allocation. Subsequently, when memblock is freed later in boot flow, this remapped efi_memmap will have random corruption (similar to a use-after-free scenario). The corrupted EFI memory map is then passed to the next kexec-ed kernel which causes a panic when trying to use the corrupted EFI memory map. Fix this EFI memory map corruption by skipping efi_arch_mem_reserve() for kexec. Additionally, efi_mem_reserve() is used to reserve boot service memory eg. bgrt, but it is not necessary for kexec boot, as there are no boot services in kexec reboot at all after the first kernel ExitBootServices(). The UEFI memmap passed to kexec kernel includes not only the runtime service memory map but also the boot service memory ranges which were reserved by the first kernel with efi_mem_reserve, and those boot service memory ranges have already been marked "EFI_MEMORY_RUNTIME" attribute. This is the additional reason why efi_mem_reserve can be skipped for kexec booting and by checking the set EFI_MEMORY_RUNTIME attribute. Suggested-by: Dave Young <dyoung@xxxxxxxxxx> [Dave Young: checking the md attribute instead of checking the efi_setup] Signed-off-by: Ashish Kalra <ashish.kalra@xxxxxxx> --- arch/x86/platform/efi/quirks.c | 30 +++++++++++++++++++++++++++--- 1 file changed, 27 insertions(+), 3 deletions(-) diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c index f0cc00032751..6f398c59278a 100644 --- a/arch/x86/platform/efi/quirks.c +++ b/arch/x86/platform/efi/quirks.c @@ -255,15 +255,39 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size) struct efi_memory_map_data data = { 0 }; struct efi_mem_range mr; efi_memory_desc_t md; - int num_entries; + int num_entries, ret; void *new; - if (efi_mem_desc_lookup(addr, &md) || - md.type != EFI_BOOT_SERVICES_DATA) { + /* + * efi_mem_reserve() is used to reserve boot service memory, eg. bgrt, + * but it is not neccasery for kexec, as there are no boot services in + * kexec reboot at all after the first kernel's ExitBootServices(). + * + * Additionally kexec_enter_virtual_mode() during late init will remap + * the efi_memmap physical pages allocated here via memblock & then + * subsequently cause random EFI memmap corruption once memblock is freed. + * + * Therefore, skip efi_mem_reserve for kexec booting by checking the + * EFI_MEMORY_RUNTIME attribute which indicates boot service memory + * ranges reserved by the first kernel using efi_mem_reserve and marked + * with EFI_MEMORY_RUNTIME attribute. + */ + + ret = efi_mem_desc_lookup(addr, &md); + if (ret) { pr_err("Failed to lookup EFI memory descriptor for %pa\n", &addr); return; } + if (md.type != EFI_BOOT_SERVICES_DATA) { + pr_err("Skip reserving non EFI Boot Service Data memory for %pa\n", &addr); + return; + } + + /* Kexec copied the efi memmap from the first kernel, thus skip the case */ + if (md.attribute & EFI_MEMORY_RUNTIME) + return; + if (addr + size > md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT)) { pr_err("Region spans EFI memory descriptors, %pa\n", &addr); return; -- 2.34.1 _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec