On 23 October 2014 21:14, Mark Rutland <mark.rutland@xxxxxxx> wrote: > [...] > >> > > It also looks like EFI_BOOT flag will be set even if uefi_init fails. >> > > If uefi_init fails, we only need reserve_regions() for the purpose >> > > of adding memblocks. Otherwise, we end up wasting a lot of memory. >> > >> > What memory are we wasting in that case? Surely the only items that we >> > could choose to throw away if we failed uefi_init are >> > EFI_ACPI_RECLAIM_MEMORY and EFI_MEMORY_RUNTIME? >> > >> > We might want to keep those around so we can kexec into a kernel where >> > we can make use of them. Surely they shouldn't take up a significant >> > proportion of the available memory? >> >> In addition, reserve_regions() also reserves BOOT_SERVICES (which gets >> freed up later after set_virtual_address_map(). That won't happen if >> uefi_init() fails. In any case, if uefi_init() fails, we won't be able >> to find the ACPI tables kexec kernel won't be able to either. > > If you need the ACPI tables, the first kernel won't boot. However, if we > fail to map the runtime services data and code, that doesn't mean the > next kernel can't. We might have failed to map the runtime services due > to an early_ioremap bug or similar, and that could be fixed in the > kernel we're about to kexec to. > > If we're going to nuke runtime regions, we should nuke them in the > memory map descriptors as a courtesy to the next kernel. Otherwise we > could be more courteous and simply leave them be (which is my > preference). > >> The BOOT_SERVICES stuff is the big chunk. There was some discussion of >> not reserving that but no one has gotten around to writing the patch >> to clean out the code that does it. > > I've just spent some time doing so; patch below. It boots happily on my > Juno (with 4KiB pages at least), but I haven't given it much of a stress > test. > > I wasn't sure if unusable memory could show up with EFI_MEMORY_WB > attributes. If so, this patch still won't prevent that from being mapped > as cacheable, but that should be easy to fix. > > Thanks, > Mark. > > ---->8---- > From 581d5bac5b4a6f93e23737012b71c58d809af6bb Mon Sep 17 00:00:00 2001 > From: Mark Rutland <mark.rutland@xxxxxxx> > Date: Thu, 23 Oct 2014 19:41:55 +0100 > Subject: [PATCH] arm64: efi: Simplify memory descriptor handling > > Currently discovering the memory map from UEFI is a multi-stage affair, > where the boot services code and data are kept around for much longer > than necessary. Additionally, potential overlap between memory and IO > mappings at kernel page granularity is not handled. > > This patch attempts to solve both of these issues, with the addition and > filtering of memory regions being handled in a single function. > > First, all normal memory (i.e. anything which can be mapped writeback > cacheable) is added to memblock. Second, IO and reserved regions are > filtered out of memblock at kernel page granularity, to prevent > potential conflicting mappings. > Fortunately, this is being addressed in an upcoming UEFI spec revision: it will not be allowed for memory regions that could potentially conflict in terms of cache attributes to share the same 64 KB page frame. As I mentioned, my kexec patches will move SetVirtualAddressMap() to the stub, which means we will also be able to drop the id map bits entirely. As I said, let's revisit this in that context, and drop it for now. -- Ard. > As some reserved regions must be present in the idmap these are > reserved. Everything else that's not normal memory is removed, > preventing the possibility of a cacheable mapping covering something it > shouldn't. > > Signed-off-by: Mark Rutland <mark.rutland@xxxxxxx> > Cc: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx > Cc: Catalin Marinas <catalin.marinas@xxxxxxx> > Cc: Leif Lindholm <leif.lindholm@xxxxxxxxxx > Cc: Mark Salter <msalter@xxxxxxxxxx> > --- > arch/arm64/kernel/efi.c | 181 +++++++++++------------------------------------- > 1 file changed, 42 insertions(+), 139 deletions(-) > > diff --git a/arch/arm64/kernel/efi.c b/arch/arm64/kernel/efi.c > index 03aaa99..8873c28 100644 > --- a/arch/arm64/kernel/efi.c > +++ b/arch/arm64/kernel/efi.c > @@ -154,161 +154,66 @@ static __init int is_reserve_region(efi_memory_desc_t *md) > return 0; > } > > -static __init void reserve_regions(void) > +static __init void efi_add_memory(void) > { > efi_memory_desc_t *md; > - u64 paddr, npages, size; > + u64 paddr, size; > > if (uefi_debug) > - pr_info("Processing EFI memory map:\n"); > + pr_info("EFI: adding normal memory:\n"); > > for_each_efi_memory_desc(&memmap, md) { > + if (!is_normal_ram(md)) > + continue; > + > paddr = md->phys_addr; > - npages = md->num_pages; > + size = md->num_pages << EFI_PAGE_SHIFT; > > if (uefi_debug) > - pr_info(" 0x%012llx-0x%012llx [%s]", > - paddr, paddr + (npages << EFI_PAGE_SHIFT) - 1, > + pr_info(" 0x%012llx-0x%012llx [%s]\n", > + paddr, size - 1, > memory_type_name[md->type]); > > - memrange_efi_to_native(&paddr, &npages); > - size = npages << PAGE_SHIFT; > + early_init_dt_add_memory_arch(paddr, size); > + } > + > + /* > + * Regions describe in an EFI_MEMORY_DESCRIPTOR are only guaranteed to > + * be 4KiB aligned but the kernel page size could be larger, and > + * regions with different attributes could fall into the same kernel > + * page. Thus we must remove any memory in the same kernel page as an > + * IO region. > + * > + * Reserved regions of memory need to be in the idmap, so just reserve > + * them. > + */ > + if (uefi_debug) > + pr_info("EFI: correcting for overlapping regions:\n"); > + > + for_each_efi_memory_desc(&memmap, md) { > + if (!is_reserve_region(md) && is_normal_ram(md)) > + continue; > + > + paddr = md->phys_addr; > + size = md->num_pages << EFI_PAGE_SHIFT; > + > + size = PAGE_ALIGN(size + (paddr & ~PAGE_MASK)); > + paddr &= PAGE_MASK; > > - if (is_normal_ram(md)) > - early_init_dt_add_memory_arch(paddr, size); > + if (uefi_debug) > + pr_info(" 0x%012llx-0x%012llx [%s]\n", > + paddr, size - 1, > + memory_type_name[md->type]); > > - if (is_reserve_region(md) || > - md->type == EFI_BOOT_SERVICES_CODE || > - md->type == EFI_BOOT_SERVICES_DATA) { > + if (is_reserve_region(md)) > memblock_reserve(paddr, size); > - if (uefi_debug) > - pr_cont("*"); > - } > - > - if (uefi_debug) > - pr_cont("\n"); > + else > + memblock_remove(paddr, size); > } > > set_bit(EFI_MEMMAP, &efi.flags); > } > > - > -static u64 __init free_one_region(u64 start, u64 end) > -{ > - u64 size = end - start; > - > - if (uefi_debug) > - pr_info(" EFI freeing: 0x%012llx-0x%012llx\n", start, end - 1); > - > - free_bootmem_late(start, size); > - return size; > -} > - > -static u64 __init free_region(u64 start, u64 end) > -{ > - u64 map_start, map_end, total = 0; > - > - if (end <= start) > - return total; > - > - map_start = (u64)memmap.phys_map; > - map_end = PAGE_ALIGN(map_start + (memmap.map_end - memmap.map)); > - map_start &= PAGE_MASK; > - > - if (start < map_end && end > map_start) { > - /* region overlaps UEFI memmap */ > - if (start < map_start) > - total += free_one_region(start, map_start); > - > - if (map_end < end) > - total += free_one_region(map_end, end); > - } else > - total += free_one_region(start, end); > - > - return total; > -} > - > -static void __init free_boot_services(void) > -{ > - u64 total_freed = 0; > - u64 keep_end, free_start, free_end; > - efi_memory_desc_t *md; > - > - /* > - * If kernel uses larger pages than UEFI, we have to be careful > - * not to inadvertantly free memory we want to keep if there is > - * overlap at the kernel page size alignment. We do not want to > - * free is_reserve_region() memory nor the UEFI memmap itself. > - * > - * The memory map is sorted, so we keep track of the end of > - * any previous region we want to keep, remember any region > - * we want to free and defer freeing it until we encounter > - * the next region we want to keep. This way, before freeing > - * it, we can clip it as needed to avoid freeing memory we > - * want to keep for UEFI. > - */ > - > - keep_end = 0; > - free_start = 0; > - > - for_each_efi_memory_desc(&memmap, md) { > - u64 paddr, npages, size; > - > - if (is_reserve_region(md)) { > - /* > - * We don't want to free any memory from this region. > - */ > - if (free_start) { > - /* adjust free_end then free region */ > - if (free_end > md->phys_addr) > - free_end -= PAGE_SIZE; > - total_freed += free_region(free_start, free_end); > - free_start = 0; > - } > - keep_end = md->phys_addr + (md->num_pages << EFI_PAGE_SHIFT); > - continue; > - } > - > - if (md->type != EFI_BOOT_SERVICES_CODE && > - md->type != EFI_BOOT_SERVICES_DATA) { > - /* no need to free this region */ > - continue; > - } > - > - /* > - * We want to free memory from this region. > - */ > - paddr = md->phys_addr; > - npages = md->num_pages; > - memrange_efi_to_native(&paddr, &npages); > - size = npages << PAGE_SHIFT; > - > - if (free_start) { > - if (paddr <= free_end) > - free_end = paddr + size; > - else { > - total_freed += free_region(free_start, free_end); > - free_start = paddr; > - free_end = paddr + size; > - } > - } else { > - free_start = paddr; > - free_end = paddr + size; > - } > - if (free_start < keep_end) { > - free_start += PAGE_SIZE; > - if (free_start >= free_end) > - free_start = 0; > - } > - } > - if (free_start) > - total_freed += free_region(free_start, free_end); > - > - if (total_freed) > - pr_info("Freed 0x%llx bytes of EFI boot services memory", > - total_freed); > -} > - > void __init efi_init(void) > { > struct efi_fdt_params params; > @@ -330,7 +235,7 @@ void __init efi_init(void) > if (uefi_init() < 0) > return; > > - reserve_regions(); > + efi_add_memory(); > } > > void __init efi_idmap_init(void) > @@ -452,8 +357,6 @@ static int __init arm64_enter_virtual_mode(void) > > kfree(virtmap); > > - free_boot_services(); > - > if (status != EFI_SUCCESS) { > pr_err("Failed to set EFI virtual address map! [%lx]\n", > status); > -- > 1.9.1 > -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html