Re: [PATCH] x86/efi: Map EFI memmap entries in-order at runtime

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi, 

On Fri, Sep 04, 2015 at 02:14:07PM +0100, Matt Fleming wrote:
> From: Matt Fleming <matt.fleming@xxxxxxxxx>
> 
> Beginning with UEFI v2.5 EFI_PROPERTIES_TABLE was introduced that
> signals that the firmware PE/COFF loader supports splitting code and
> data sections of PE/COFF images into separate EFI memory map entries.
> This allows the kernel to map those regions with strict memory
> protections, e.g. EFI_MEMORY_RO for code, EFI_MEMORY_XP for data, etc.
> 
> Unfortunately, an unwritten requirement of this new feature is that
> the regions need to be mapped with the same offsets relative to each
> other as observed in the EFI memory map. If this is not done crashes
> like this may occur,
> 
>  [    0.006391] BUG: unable to handle kernel paging request at fffffffefe6086dd
>  [    0.006923] IP: [<fffffffefe6086dd>] 0xfffffffefe6086dd
>  [    0.007000] Call Trace:
>  [    0.007000]  [<ffffffff8104c90e>] efi_call+0x7e/0x100
>  [    0.007000]  [<ffffffff81602091>] ? virt_efi_set_variable+0x61/0x90
>  [    0.007000]  [<ffffffff8104c583>] efi_delete_dummy_variable+0x63/0x70
>  [    0.007000]  [<ffffffff81f4e4aa>] efi_enter_virtual_mode+0x383/0x392
>  [    0.007000]  [<ffffffff81f37e1b>] start_kernel+0x38a/0x417
>  [    0.007000]  [<ffffffff81f37495>] x86_64_start_reservations+0x2a/0x2c
>  [    0.007000]  [<ffffffff81f37582>] x86_64_start_kernel+0xeb/0xef
> 
> Here 0xfffffffefe6086dd refers to an address the firmware expects to
> be mapped but which the OS never claimed was mapped. The issue is that
> included in these regions are relative addresses to other regions
> which were emitted by the firmware toolchain before the "splitting" of
> sections occurred at runtime.
> 
> Needless to say, we don't satisfy this unwritten requirement on x86_64
> and instead map the EFI memory map entries in reverse order. The above
> crash is almost certainly triggerable with any kernel newer than v3.13
> because that's when we rewrote the EFI runtime region mapping code, in
> commit d2f7cbe7b26a ("x86/efi: Runtime services virtual mapping"). For
> kernel versions before v3.13 things may work by pure luck depending on
> the fragmentation of the kernel virtual address space at the time we
> map the EFI regions.
> 
> Instead of mapping the EFI memory map entries in reverse order, where
> entry N has a higher virtual address than entry N+1, map them in the
> same order as they appear in the EFI memory map to preserve this
> relative offset between regions.
> 
> This patch has been kept as small as possible with the intention that
> it should be applied aggressively to stable and distribution kernels.
> It is very much a bugfix rather than support for a new feature, since
> when EFI_PROPERTIES_TABLE is enabled we must map things as outlined
> above to even boot - we have no way of asking the firmware not to
> split the code/data regions.
> 
> In fact, this patch doesn't even make use of the more strict memory
> protections available in UEFI v2.5. That will come later.
> 
> Reported-by: Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx>
> Cc: Borislav Petkov <bp@xxxxxxx>
> Cc: Leif Lindholm <leif.lindholm@xxxxxxxxxx>
> Cc: Peter Jones <pjones@xxxxxxxxxx>
> Cc: James Bottomley <JBottomley@xxxxxxxx>
> Cc: Matthew Garrett <mjg59@xxxxxxxxxxxxx>
> Cc: H. Peter Anvin <hpa@xxxxxxxxx>
> Cc: Dave Young <dyoung@xxxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx>
> Signed-off-by: Matt Fleming <matt.fleming@xxxxxxxxx>

This patch works to me on Intel S1200V3RPS to fix issue:
DMI: Intel Corporation (uefidk.com) Intel Server Board S1200V3RPS UEFI Development Kit/ROMLEY, BIOS 2.0

Tested-by: Lee, Chun-Yi <jlee@xxxxxxxx>

> ---
>  arch/x86/include/asm/efi.h     |   1 +
>  arch/x86/platform/efi/efi.c    |   2 +
>  arch/x86/platform/efi/efi_32.c |   1 +
>  arch/x86/platform/efi/efi_64.c | 109 ++++++++++++++++++++++++++++++++++-------
>  4 files changed, 96 insertions(+), 17 deletions(-)
> 
> diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
> index 155162ea0e00..fe988599c5e1 100644
> --- a/arch/x86/include/asm/efi.h
> +++ b/arch/x86/include/asm/efi.h
> @@ -96,6 +96,7 @@ extern void __init efi_call_phys_epilog(pgd_t *save_pgd);
>  extern void __init efi_unmap_memmap(void);
>  extern void __init efi_memory_uc(u64 addr, unsigned long size);
>  extern void __init efi_map_region(efi_memory_desc_t *md);
> +extern void __init efi_map_calculate_base(void);
>  extern void __init efi_map_region_fixed(efi_memory_desc_t *md);
>  extern void efi_sync_low_kernel_mappings(void);
>  extern int __init efi_setup_page_tables(unsigned long pa_memmap, unsigned num_pages);
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index e4308fe6afe8..5276ec6eefef 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -714,6 +714,8 @@ static void * __init efi_map_regions(int *count, int *pg_shift)
>  	unsigned long left = 0;
>  	efi_memory_desc_t *md;
>  
> +	efi_map_calculate_base();
> +
>  	for (p = memmap.map; p < memmap.map_end; p += memmap.desc_size) {
>  		md = p;
>  		if (!(md->attribute & EFI_MEMORY_RUNTIME)) {
> diff --git a/arch/x86/platform/efi/efi_32.c b/arch/x86/platform/efi/efi_32.c
> index ed5b67338294..8a80baa877fb 100644
> --- a/arch/x86/platform/efi/efi_32.c
> +++ b/arch/x86/platform/efi/efi_32.c
> @@ -55,6 +55,7 @@ void __init efi_map_region(efi_memory_desc_t *md)
>  
>  void __init efi_map_region_fixed(efi_memory_desc_t *md) {}
>  void __init parse_efi_setup(u64 phys_addr, u32 data_len) {}
> +void __init efi_map_calculate_base(void) {}
>  
>  pgd_t * __init efi_call_phys_prolog(void)
>  {
> diff --git a/arch/x86/platform/efi/efi_64.c b/arch/x86/platform/efi/efi_64.c
> index a0ac0f9c307f..4c1d15984a79 100644
> --- a/arch/x86/platform/efi/efi_64.c
> +++ b/arch/x86/platform/efi/efi_64.c
> @@ -214,14 +214,52 @@ static void __init __map_region(efi_memory_desc_t *md, u64 va)
>  			   md->phys_addr, va);
>  }
>  
> +/**
> + * efi_map_start_addr - Find an address to map an EFI region
> + * @md: EFI memory map region descriptor
> + * @addr: Begin searching from this address
> + *
> + * Calculate the next highest virtual address at which to map
> + * @md->phys_addr while abiding by any alignment restrictions. For
> + * example, if @md->phys_addr is 2M-aligned we keep the virtual
> + * address 2M-aligned.
> + */
> +static u64 efi_map_start_addr(efi_memory_desc_t *md, u64 addr)
> +{
> +	u64 tail;
> +
> +	tail = md->phys_addr & (PMD_SIZE - 1);
> +
> +	/* Is physical address 2M-aligned? */
> +	if (!tail) {
> +		addr = roundup(addr, PMD_SIZE);
> +	} else {
> +		u64 prev_addr = addr;
> +
> +		/* get us the same offset within this 2M page */
> +		addr = (addr & PMD_MASK) + tail;
> +
> +		/*
> +		 * Roll forward to the next PMD.
> +		 */
> +		if (addr < prev_addr)
> +			addr += PMD_SIZE;
> +	}
> +
> +	return addr;
> +}
> +
>  void __init efi_map_region(efi_memory_desc_t *md)
>  {
>  	unsigned long size = md->num_pages << PAGE_SHIFT;
> -	u64 pa = md->phys_addr;
>  
>  	if (efi_enabled(EFI_OLD_MEMMAP))
>  		return old_map_region(md);
>  
> +	/* Has efi_map_calculate_base() been called? */
> +	if (WARN_ON_ONCE(efi_va == EFI_VA_START))
> +		return;
> +
>  	/*
>  	 * Make sure the 1:1 mappings are present as a catch-all for b0rked
>  	 * firmware which doesn't update all internal pointers after switching
> @@ -239,23 +277,9 @@ void __init efi_map_region(efi_memory_desc_t *md)
>  		return;
>  	}
>  
> -	efi_va -= size;
> -
> -	/* Is PA 2M-aligned? */
> -	if (!(pa & (PMD_SIZE - 1))) {
> -		efi_va &= PMD_MASK;
> -	} else {
> -		u64 pa_offset = pa & (PMD_SIZE - 1);
> -		u64 prev_va = efi_va;
> -
> -		/* get us the same offset within this 2M page */
> -		efi_va = (efi_va & PMD_MASK) + pa_offset;
> -
> -		if (efi_va > prev_va)
> -			efi_va -= PMD_SIZE;
> -	}
> +	efi_va = efi_map_start_addr(md, efi_va);
>  
> -	if (efi_va < EFI_VA_END) {
> +	if (efi_va > EFI_VA_START) {
>  		pr_warn(FW_WARN "VA address range overflow!\n");
>  		return;
>  	}
> @@ -263,6 +287,57 @@ void __init efi_map_region(efi_memory_desc_t *md)
>  	/* Do the VA map */
>  	__map_region(md, efi_va);
>  	md->virt_addr = efi_va;
> +	efi_va += size;
> +}
> +
> +/**
> + * efi_map_calculate_base - Find the base address to map EFI regions
> + *
> + * This function calculates how much virtual address space is required
> + * to map all the EFI regions for runtime. We map those regions as
> + * close to each other as possible while sticking with the PMD
> + * alignment and ensuring we end at EFI_VA_START.
> + *
> + * On return we set 'efi_va' to the start address of the virtual
> + * address space where efi_map_region() will begin mapping things.
> + *
> + * Beginning in UEFI v2.5 the EFI_PROPERTIES_TABLE config table
> + * feature requires us to map all entries in the same order as they
> + * appear in the EFI memory map. That is to say, entry N must have a
> + * lower virtual address than entry N+1. This is because the firmware
> + * toolchain leaves relative references in the code/data sections,
> + * which are split and become separate EFI memory regions. Mapping
> + * things out-of-order leads to the firmware accessing unmapped
> + * addresses.
> + *
> + * We need to map things this way whether or not we actually make use
> + * of the EFI_PROPERTIES_TABLE feature.
> + *
> + * Call this function before invoking efi_map_region().
> + */
> +void __init efi_map_calculate_base(void)
> +{
> +	efi_memory_desc_t *md;
> +	u64 size = 0;
> +
> +	/*
> +	 * We don't need to place the mappings this carefully for the
> +	 * old mapping scheme.
> +	 */
> +	if (efi_enabled(EFI_OLD_MEMMAP))
> +		return;
> +
> +	for_each_efi_memory_desc(&memmap, md) {
> +		if (!(md->attribute & EFI_MEMORY_RUNTIME) &&
> +		    md->type != EFI_BOOT_SERVICES_CODE &&
> +		    md->type != EFI_BOOT_SERVICES_DATA)
> +			continue;
> +
> +		size = efi_map_start_addr(md, size);
> +		size += md->num_pages << PAGE_SHIFT;
> +	}
> +
> +	efi_va -= size;
>  }
>  
>  /*
> -- 
> 2.1.0
> 
> --

Thanks a lot!
Joey Lee
--
To unsubscribe from this list: send the line "unsubscribe stable" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]