On Tue, Dec 3, 2019 at 11:53 PM Dave Young <dyoung@xxxxxxxxxx> wrote: > > Michael Weiser reported he got below error during a kexec rebooting: > esrt: Unsupported ESRT version 2904149718861218184. > > The ESRT memory stays in EFI boot services data, and it was reserved > in kernel via efi_mem_reserve(). The initial purpose of the reservation > is to reuse the EFI boot services data across kexec reboot. For example > the BGRT image data and some ESRT memory like Michael reported. > > But although the memory is reserved it is not updated in X86 e820 table. > And kexec_file_load iterate system ram in io resource list to find places > for kernel, initramfs and other stuff. In Michael's case the kexec loaded > initramfs overwritten the ESRT memory and then the failure happened. > > Since kexec_file_load depends on the e820 to be updated, just fix this > by updating the reserved EFI boot services memory as reserved type in e820. > > Originally any memory descriptors with EFI_MEMORY_RUNTIME attribute are > bypassed in the reservation code path because they are assumed as reserved. > But the reservation is still needed for multiple kexec reboot. > And it is the only possible case we come here thus just drop the code > chunk then everything works without side effects. > > On my machine the ESRT memory sits in an EFI runtime data range, it does > not trigger the problem, but I successfully tested with BGRT instead. > both kexec_load and kexec_file_load work and kdump works as well. > > Signed-off-by: Dave Young <dyoung@xxxxxxxxxx> > --- > arch/x86/platform/efi/quirks.c | 6 ++---- > 1 file changed, 2 insertions(+), 4 deletions(-) > > --- linux-x86.orig/arch/x86/platform/efi/quirks.c > +++ linux-x86/arch/x86/platform/efi/quirks.c > @@ -260,10 +260,6 @@ void __init efi_arch_mem_reserve(phys_ad > return; > } > > - /* No need to reserve regions that will never be freed. */ > - if (md.attribute & EFI_MEMORY_RUNTIME) > - return; > - > size += addr % EFI_PAGE_SIZE; > size = round_up(size, EFI_PAGE_SIZE); > addr = round_down(addr, EFI_PAGE_SIZE); > @@ -293,6 +289,8 @@ void __init efi_arch_mem_reserve(phys_ad > early_memunmap(new, new_size); > > efi_memmap_install(new_phys, num_entries); > + e820__range_update(addr, size, E820_TYPE_RAM, E820_TYPE_RESERVED); > + e820__update_table(e820_table); > } > > /* > Bisect says this change (commit af1648984828) is triggering a regression, likely not urgent, in my testing of the new efi_fake_mem= facility to allow memory to be marked "soft reserved" via the kernel command line (commit 199c84717612 x86/efi: Add efi_fake_mem support for EFI_MEMORY_SP). The following command line triggers the crash signature below: efi_fake_mem=4G@9G:0x40000,4G@13G:0x40000 However, this command line works ok: efi_fake_mem=8G@9G:0x40000 So, something about multiple efi_fake_mem statements interacts badly with this change. Nothing obvious occurs to me at the moment, I'll keep debugging, but wanted to highlight this in the meantime in case someone else sees a deeper issue or the root cause. BUG: unable to handle page fault for address: ffffffffff281000 #PF: supervisor write access in kernel mode #PF: error_code(0x0002) - not-present page PGD 188615067 P4D 188615067 PUD 188617067 PMD 188e4d067 PTE 0 Oops: 0002 [#1] SMP PTI CPU: 0 PID: 0 Comm: swapper Not tainted 5.4.0+ #154 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 0.0.0 02/06/2015 RIP: 0010:efi_memmap_insert+0xed/0x14b Code: 48 89 48 18 49 39 d9 76 67 49 39 d1 73 62 4c 89 c9 48 2b 48 08 4c 89 c6 48 c1 e9 0c 48 89 48 18 49 8b 4a 28 48 01 c8 48 89 c7 <f3> a4 49 39 d3 73 2c 4c 89 48 08 4c 29 da 4c 89 c6 4c 89 68 18 48 RSP: 0000:ffffffffb7603d70 EFLAGS: 00010086 RAX: ffffffffff280ff0 RBX: 0000000000000000 RCX: 0000000000000020 RDX: ffffffffffffffff RSI: ffffffffff200fe0 RDI: ffffffffff281000 RBP: 00000000bea2d000 R08: ffffffffff200fd0 R09: 00000000bea06000 R10: ffffffffb77e1718 R11: 00000000bea2cfff R12: 800000000000000f R13: 0000000000000027 R14: ffffffff415fa001 R15: 0000000000000ab0 FS: 0000000000000000(0000) GS:ffffffffb7c31000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffff281000 CR3: 0000000188610000 CR4: 00000000000606b0 Call Trace: ? efi_arch_mem_reserve+0x149/0x1a6 ? bgrt_init+0xbe/0xbe ? bgrt_init+0xbe/0xbe ? acpi_parse_bgrt+0xa/0xd ? acpi_table_parse+0x86/0xb8 ? acpi_boot_init+0x494/0x4e3 ? acpi_parse_x2apic+0x87/0x87 ? setup_acpi_sci+0xa2/0xa2 ? setup_arch+0x8db/0x9e1 ? start_kernel+0x6a/0x547 ? secondary_startup_64+0xb6/0xc0 Modules linked in: CR2: ffffffffff281000 random: get_random_bytes called from print_oops_end_marker+0x26/0x40 with crng_init=0 ---[ end trace 2acc14aa0990ee9d ]--- RIP: 0010:efi_memmap_insert+0xed/0x14b Code: 48 89 48 18 49 39 d9 76 67 49 39 d1 73 62 4c 89 c9 48 2b 48 08 4c 89 c6 48 c1 e9 0c 48 89 48 18 49 8b 4a 28 48 01 c8 48 89 c7 <f3> a4 49 39 d3 73 2c 4c 89 48 08 4c 29 da 4c 89 c6 4c 89 68 18 48 RSP: 0000:ffffffffb7603d70 EFLAGS: 00010086 RAX: ffffffffff280ff0 RBX: 0000000000000000 RCX: 0000000000000020 RDX: ffffffffffffffff RSI: ffffffffff200fe0 RDI: ffffffffff281000 RBP: 00000000bea2d000 R08: ffffffffff200fd0 R09: 00000000bea06000 R10: ffffffffb77e1718 R11: 00000000bea2cfff R12: 800000000000000f R13: 0000000000000027 R14: ffffffff415fa001 R15: 0000000000000ab0 FS: 0000000000000000(0000) GS:ffffffffb7c31000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffff281000 CR3: 0000000188610000 CR4: 00000000000606b0 Kernel panic - not syncing: Fatal exception ---[ end Kernel panic - not syncing: Fatal exception ]--- _______________________________________________ kexec mailing list kexec@xxxxxxxxxxxxxxxxxxx http://lists.infradead.org/mailman/listinfo/kexec