On 12/31/19 at 09:04pm, Dan Williams wrote: > On Tue, Dec 31, 2019 at 8:52 PM Dave Young <dyoung@xxxxxxxxxx> wrote: > > > > Hi Dan, > > On 12/31/19 at 02:04pm, Dan Williams wrote: > > > Dave noticed that when specifying multiple efi_fake_mem= entries only > > > the last entry was successfully being reflected in the efi memory map. > > > This is due to the fact that the efi_memmap_insert() is being called > > > multiple times, but on successive invocations the insertion should be > > > applied to the last new memmap rather than the original map at > > > efi_fake_memmap() entry. > > > > > > Rework efi_fake_memmap() to install the new memory map after each > > > efi_fake_mem= entry is parsed. > > > > > > This also fixes an issue in efi_fake_memmap() that caused it to litter > > > emtpy entries into the end of the efi memory map. The empty entry causes > > > efi_memmap_insert() to attempt more memmap splits / copies than > > > efi_memmap_split_count() accounted for when sizing the new map. > > > > > > BUG: unable to handle page fault for address: ffffffffff281000 > > > [..] > > > RIP: 0010:efi_memmap_insert+0x11d/0x191 > > > [..] > > > Call Trace: > > > ? bgrt_init+0xbe/0xbe > > > ? efi_arch_mem_reserve+0x1cb/0x228 > > > ? acpi_parse_bgrt+0xa/0xd > > > ? acpi_table_parse+0x86/0xb8 > > > ? acpi_boot_init+0x494/0x4e3 > > > ? acpi_parse_x2apic+0x87/0x87 > > > ? setup_acpi_sci+0xa2/0xa2 > > > ? setup_arch+0x8db/0x9e1 > > > ? start_kernel+0x6a/0x547 > > > ? secondary_startup_64+0xb6/0xc0 > > > > > > Commit af1648984828 "x86/efi: Update e820 with reserved EFI boot > > > services data to fix kexec breakage" is listed in Fixes: since it > > > introduces more occurrences where efi_memmap_insert() is invoked after > > > an efi_fake_mem= configuration has been parsed. Previously the side > > > effects of vestigial empty entries were benign, but with commit > > > af1648984828 that follow-on efi_memmap_insert() invocation triggers the > > > above crash signature. > > > > > > Fixes: 0f96a99dab36 ("efi: Add 'efi_fake_mem' boot option") > > > Fixes: af1648984828 ("x86/efi: Update e820 with reserved EFI boot services...") > > > Link: https://lore.kernel.org/r/20191231014630.GA24942@xxxxxxxxxxxxxxxxxxxxxxxxxx > > > Reported-by: Dave Young <dyoung@xxxxxxxxxx> > > > Cc: Taku Izumi <izumi.taku@xxxxxxxxxxxxxx> > > > Cc: Michael Weiser <michael@xxxxxxxxxxxxxxxxxxx> > > > Cc: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> > > > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > > > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > > > Signed-off-by: Dan Williams <dan.j.williams@xxxxxxxxx> > > > --- > > > drivers/firmware/efi/fake_mem.c | 32 +++++++++++++++++--------------- > > > drivers/firmware/efi/memmap.c | 2 +- > > > include/linux/efi.h | 2 ++ > > > 3 files changed, 20 insertions(+), 16 deletions(-) > > > > > > diff --git a/drivers/firmware/efi/fake_mem.c b/drivers/firmware/efi/fake_mem.c > > > index 7e53e5520548..68d752d8af21 100644 > > > --- a/drivers/firmware/efi/fake_mem.c > > > +++ b/drivers/firmware/efi/fake_mem.c > > > @@ -34,26 +34,17 @@ static int __init cmp_fake_mem(const void *x1, const void *x2) > > > return 0; > > > } > > > > > > -void __init efi_fake_memmap(void) > > > +static void __init efi_fake_range(struct efi_mem_range *efi_range) > > > { > > > int new_nr_map = efi.memmap.nr_map; > > > efi_memory_desc_t *md; > > > phys_addr_t new_memmap_phy; > > > unsigned long flags = 0; > > > void *new_memmap; > > > - int i; > > > - > > > - if (!efi_enabled(EFI_MEMMAP) || !nr_fake_mem) > > > - return; > > > > > > /* count up the number of EFI memory descriptor */ > > > - for (i = 0; i < nr_fake_mem; i++) { > > > - for_each_efi_memory_desc(md) { > > > - struct range *r = &efi_fake_mems[i].range; > > > - > > > - new_nr_map += efi_memmap_split_count(md, r); > > > - } > > > - } > > > + for_each_efi_memory_desc(md) > > > + new_nr_map += efi_memmap_split_count(md, &efi_range->range); > > > > I have another concern here :( > > > > THe efi_memmap_split_count mean to only split for a specific md, and you > > can see arch/x86/platform/efi/quirks.c about the use: > > if (addr + size > md.phys_addr + (md.num_pages << EFI_PAGE_SHIFT)) { > > pr_err("Region spans EFI memory descriptors, %pa\n", &addr); > > return; > > } > > > > Any memory region to be inserted but spans different md will be > > rejected. So the memmap insert logic seems does not support the > > spanned ranges. I did not find a case two contiguous same type ranges > > eg. two "Conventional memory", if have they should have been merged. > > > > So maybe just use same way as the quirks.c here to find the valid md first > > then get the split count? > > I don't immediately see why it would be a problem to just let the md > loop that efi_fake_memmap() performs try to split multiple entries. It > may end up with more splits than necessary in which case we'll need > that piece from my original patch to clean those up. Thanks for the > heads up, I'll give it a try and see what shakes out. Are you seeing > any misbehavior on your end? Just some worries, but I did not see any misbehaviors :) > > > > > Otherwise I tested the series bootup test passed. > > > > BTW, another issue about fakemem, currently it only works with normal > > physical boot, in case of kexec reboot the kernel only aware of EFI > > runtime memory ranges, we do not pass other types in memmap. But maybe > > we can live with it considering fake mem is only for debugging purpose. > > Does kexec preserve iomem? I.e. as long as the initial translation of > efi entries to e820, and resulting resource tree, is preserved by > successive kexec cycles then I think we're ok. It will not preserve them automatically, but that can be fixed if needed. There are two places: 1. the in kernel loader, we can do similar with below commit (for Soft Reseved instead): commit 980621daf368f2b9aa69c7ea01baa654edb7577b Author: Lianbo Jiang <lijiang@xxxxxxxxxx> Date: Tue Apr 23 09:30:07 2019 +0800 x86/crash: Add e820 reserved ranges to kdump kernel's e820 table 2. the userspace loader, in kexec-tools: It only parse and pass "Reserved" for the time being, also need handle the Soft Reserved" part as well. Thanks Dave