On 5 January 2015 at 16:20, Mark Rutland <mark.rutland@xxxxxxx> wrote: > Hi Ard, > > I have a few (minor) comments below. > > On Mon, Dec 22, 2014 at 10:59:02AM +0000, Ard Biesheuvel wrote: >> In order to support kexec, the kernel needs to be able to deal with the >> state of the UEFI firmware after SetVirtualAddressMap() has been called. >> To avoid having separate code paths for non-kexec and kexec, let's move >> the call to SetVirtualAddressMap() to the stub: this will guarantee us >> that it will only be called once (since the stub is not executed during >> kexec), and ensures that the UEFI state is identical between kexec and >> normal boot. >> >> This implies that the layout of the virtual mapping needs to be created >> by the stub as well. All regions are rounded up to a naturally aligned >> multiple of 64 KB (for compatibility with 64k pages kernels) and recorded >> in the UEFI memory map. The kernel proper reads those values and installs >> the mappings in a dedicated set of page tables that are swapped in during >> UEFI Runtime Services calls. >> >> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> >> --- >> arch/arm64/include/asm/efi.h | 20 +++- >> arch/arm64/kernel/efi.c | 223 ++++++++++++++++++++----------------- >> arch/arm64/kernel/setup.c | 1 + >> drivers/firmware/efi/libstub/fdt.c | 137 ++++++++++++++++++++++- >> 4 files changed, 270 insertions(+), 111 deletions(-) > > [...] > >> +static void efi_set_pgd(struct mm_struct *mm) >> +{ >> + cpu_switch_mm(mm->pgd, mm); >> + flush_tlb_all(); >> + if (icache_is_aivivt()) >> + __flush_icache_all(); >> +} > > Do we have any idea how often we call runtime services? > > I assume not all that often (read the RTC at boot, set/get variables). > > If we're nuking the TLBs and I-cache a lot we'll probably need to > reserve an asid for the EFI virtmap. > As pointed out by Leif, we should not be calling runtime services often enough for this to have an impact imo >> +void __init efi_virtmap_init(void) >> +{ >> + efi_memory_desc_t *md; >> + >> + if (!efi_enabled(EFI_BOOT)) >> + return; >> + >> + for_each_efi_memory_desc(&memmap, md) { >> + u64 paddr, npages, size; >> + pgprot_t prot; >> + >> + if (!(md->attribute & EFI_MEMORY_RUNTIME)) >> + continue; >> + if (WARN(md->virt_addr == 0, >> + "UEFI virtual mapping incomplete or missing -- no entry found for 0x%llx\n", >> + md->phys_addr)) >> + return; > > I wonder if we might want to use a different address to signal a bad > mapping (e.g. 0UL-1 as it's not even EFI_PAGE_SIZE aligned), just in > case we have to handle a valid use of zero in future for some reason -- > perhaps coming from another OS. > The spec suggests that a failed call to SetVirtualAddressMap() means that the old 1:1 mapping is still in effect, and booting can continue as if the call was never made (This is suggested by the list of return codes, which are all based on validation of the input) That is also why efi_to_phys() returns the physical address if it doesn't find a virtual address, i.e., so that we can at least initialize EFI and the memory map. So afaict, there is really no reason to distinguish between 'SVAM() was not called' and 'SVAM() was called but didn't like its input args'. >> + >> + paddr = md->phys_addr; >> + npages = md->num_pages; >> + memrange_efi_to_native(&paddr, &npages); >> + size = npages << PAGE_SHIFT; >> + >> + pr_info(" EFI remap 0x%012llx => %p\n", > > Why not use %016llx? We might only have 48-bit PAs currently, but we may > as well keep the VA and PA equally long when printing out -- that'll > also help to identify issues with garbage in the upper 16 bits of the > PA field. > > [...] > Sure, why not. It is code I carried over from the previous version, but I am happy to change it here >> +/* >> + * This is the base address at which to start allocating virtual memory ranges >> + * for UEFI Runtime Services. This is a userland range so that we can use any >> + * allocation we choose, and eliminate the risk of a conflict after kexec. >> + */ >> +#define EFI_RT_VIRTUAL_BASE 0x40000000 > > Nit: I'm not keen on the term "userland" here. You can map stuff to EL0 > in the high half of the address space if you wanted, so TTBR0/TTBR1 > aren't architecturally user/kernel. > > s/userland/low half/, perhaps? > OK > It might also be worth pointing out that the arbitrary base address > isn't zero so as to be less likely to be an idmap. > It is the largest non-zero base2 boundary that can reasonably be shared with v7, even with a 2:2 split. I will add a comment to that effect. >> + >> +static void update_memory_map(efi_memory_desc_t *memory_map, >> + unsigned long map_size, unsigned long desc_size, >> + int *count) >> +{ >> + u64 efi_virt_base = EFI_RT_VIRTUAL_BASE; >> + efi_memory_desc_t *out = (void *)memory_map + map_size; >> + int l; >> + >> + for (l = 0; l < map_size; l += desc_size) { >> + efi_memory_desc_t *in = (void *)memory_map + l; >> + u64 paddr, size; >> + >> + if (!(in->attribute & EFI_MEMORY_RUNTIME)) >> + continue; >> + >> + /* >> + * Make the mapping compatible with 64k pages: this allows >> + * a 4k page size kernel to kexec a 64k page size kernel and >> + * vice versa. >> + */ >> + paddr = round_down(in->phys_addr, SZ_64K); >> + size = round_up(in->num_pages * EFI_PAGE_SIZE + >> + in->phys_addr - paddr, SZ_64K); >> + >> + /* >> + * Avoid wasting memory on PTEs by choosing a virtual base that >> + * is compatible with section mappings if this region has the >> + * appropriate size and physical alignment. (Sections are 2 MB >> + * on 4k granule kernels) >> + */ >> + if (IS_ALIGNED(in->phys_addr, SZ_2M) && size >= SZ_2M) >> + efi_virt_base = round_up(efi_virt_base, SZ_2M); >> + >> + in->virt_addr = efi_virt_base + in->phys_addr - paddr; >> + efi_virt_base += size; >> + >> + memcpy(out, in, desc_size); >> + out = (void *)out + desc_size; >> + ++*count; >> + } >> +} > > This feels like this should live in arch/arm64, or under some other > directory specific to arm/arm64. That said, I guess the fdt stuff is > currently arm-specific anyway, so perhaps this is fine. > > [...] > I can move this function (and get_memory_map) to arm-stub.c instead. >> @@ -248,12 +337,52 @@ efi_status_t allocate_new_fdt_and_exit_boot(efi_system_table_t *sys_table, >> } >> } >> >> + /* >> + * Update the memory map with virtual addresses. The function will also >> + * populate the spare second half of the memory_map allocation with >> + * copies of just the EFI_MEMORY_RUNTIME entries so that we can pass it >> + * straight into SetVirtualAddressMap() >> + */ >> + update_memory_map(memory_map, map_size, desc_size, >> + &runtime_entry_count); >> + >> + pr_efi(sys_table, >> + "Exiting boot services and installing virtual address map...\n"); > > I believe that the memory map is allowed to change as a result of this > call, so I think this needs to be moved before update_memory_map. > Yes, you mentioned that before but I forgot about it. I will fix that as well. -- Ard. -- To unsubscribe from this list: send the line "unsubscribe linux-efi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html