Re: kexec_file overwrites reserved EFI ESRT memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Add more cc
On 12/02/19 at 04:58pm, Dave Young wrote:
> On 11/29/19 at 04:27pm, Michael Weiser wrote:
> > Hello Dave,
> > 
> > On Mon, Nov 25, 2019 at 01:52:01PM +0800, Dave Young wrote:
> > 
> > > > > Fundamentally when deciding where to place a new kernel kexec (either
> > > > > user space or the in kernel kexec_file implementation) needs to be able
> > > > > to ask the question which memory ares are reserved.
> > [...]
> > > > > So my question is why doesn't the ESRT reservation wind up in
> > > > > /proc/iomem?
> > > > 
> > > > My guess is that the focus was that some EFI structures need to be kept
> > > > around accross the life cycle of *one* running kernel and
> > > > memblock_reserve() was enough for that. Marking them so they survive
> > > > kexecing another kernel might just never have cropped up thus far. Ard
> > > > or Matt would know.
> > > Can you check your un-reserved memory, if your memory falls into EFI
> > > BOOT* then in X86 you can use something like below if it is not covered:
> > 
> > > void __init efi_esrt_init(void)
> > > {
> > > ...
> > > 	pr_info("Reserving ESRT space from %pa to %pa.\n", &esrt_data, &end);
> > > 	if (md.type == EFI_BOOT_SERVICES_DATA)
> > > 		efi_mem_reserve(esrt_data, esrt_data_size);
> > > ...
> > > }
> > 
> > Please bear with me if I'm a bit slow on the uptake here: On my machine,
> > the esrt module reports at boot:
> > 
> > [    0.001244] esrt: Reserving ESRT space from 0x0000000074dd2f98 to 0x0000000074dd2fd0.
> > 
> > This area is of type "Boot Data" (== BOOT_SERVICES_DATA) which makes the
> > code you quote reserve it using memblock_reserve() shown by
> > memblock=debug:
> > 
> > [    0.001246] memblock_reserve: [0x0000000074dd2f98-0x0000000074dd2fcf] efi_mem_reserve+0x1d/0x2b
> > 
> > It also calls into arch/x86/platform/efi/quirks.c:efi_arch_mem_reserve()
> > which tags it as EFI_MEMORY_RUNTIME while the surrounding ones aren't
> > as shown by efi=debug:
> > 
> > [    0.178111] efi: mem10: [Boot Data          |   |  |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000074dd3000-0x0000000075becfff] (14MB)
> > [    0.178113] efi: mem11: [Boot Data          |RUN|  |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x0000000074dd2000-0x0000000074dd2fff] (0MB)
> > [    0.178114] efi: mem12: [Boot Data          |   |  |  |  |  |  |  |  |   |WB|WT|WC|UC] range=[0x000000006d635000-0x0000000074dd1fff] (119MB)
> > 
> > This prevents arch/x86/platform/efi/quirks.c:efi_free_boot_services()
> > from calling __memblock_free_late() on it. And indeed, memblock=debug does
> > not report this area as being free'd while the surrounding ones are:
> > 
> > [    0.178369] __memblock_free_late: [0x0000000074dd3000-0x0000000075becfff] efi_free_boot_services+0x126/0x1f8
> > [    0.178658] __memblock_free_late: [0x000000006d635000-0x0000000074dd1fff] efi_free_boot_services+0x126/0x1f8
> > 
> > The esrt area does not show up in /proc/iomem though:
> > 
> > 00100000-763f5fff : System RAM
> >   62000000-62a00d80 : Kernel code
> >   62c00000-62f15fff : Kernel rodata
> >   63000000-630ea8bf : Kernel data
> >   63fed000-641fffff : Kernel bss
> >   65000000-6affffff : Crash kernel
> > 
> > And thus kexec loads the new kernel right over that area as shown when
> > enabling -DDEBUG on kexec_file.c (0x74dd3000 being inbetween 0x73000000
> > and 0x73000000+0x24be000 = 0x754be000):
> > 
> > [  650.007695] kexec_file: Loading segment 0: buf=0x000000003a9c84d6 bufsz=0x5000 mem=0x98000 memsz=0x6000
> > [  650.007699] kexec_file: Loading segment 1: buf=0x0000000017b2b9e6 bufsz=0x1240 mem=0x96000 memsz=0x2000
> > [  650.007703] kexec_file: Loading segment 2: buf=0x00000000fdf72ba2 bufsz=0x1150888 mem=0x73000000 memsz=0x24be000
> > 
> > ... because it looks for any memory hole large enough in iomem resources
> > tagged as System RAM, which 0x74dd2000-0x74dd2fff would then need to be
> > excluded from on my system.
> > 
> > Looking some more at efi_arch_mem_reserve() I see that it also registers
> > the area with efi.memmap and installs it using efi_memmap_install().
> > which seems to call memremap(MEMREMAP_WB) on it. From my understanding
> > of the comments in the source of memremap(), MEMREMAP_WB does specifically
> > *not* reserve that memory in any way.
> > 
> > > Unfortunately I noticed there are different requirements/ways for
> > > different types of "reserved" memory.  But that is another topic..
> > 
> > I tried to reserve the area with something like this:
> > 
> > t a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
> > index 4de244683a7e..b86a5df027a2 100644
> > --- a/arch/x86/platform/efi/quirks.c
> > +++ b/arch/x86/platform/efi/quirks.c
> > @@ -249,6 +249,7 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
> >         efi_memory_desc_t md;
> >         int num_entries;
> >         void *new;
> > +       struct resource *res;
> >  
> >         if (efi_mem_desc_lookup(addr, &md) ||
> >             md.type != EFI_BOOT_SERVICES_DATA) {
> > @@ -294,6 +295,21 @@ void __init efi_arch_mem_reserve(phys_addr_t addr, u64 size)
> >         early_memunmap(new, new_size);
> >  
> >         efi_memmap_install(new_phys, num_entries);
> > +
> > +       res = memblock_alloc(sizeof(*res), SMP_CACHE_BYTES);
> > +       if (!res) {
> > +               pr_err("Failed to allocate EFI io resource allocator for "
> > +                               "0x%llx:0x%llx", mr.range.start, mr.range.end);
> > +               return;
> > +       }
> > +
> > +       res->start      = mr.range.start;
> > +       res->end        = mr.range.end;
> > +       res->name       = "EFI runtime";
> > +       res->flags      = IORESOURCE_MEM | IORESOURCE_BUSY;
> > +       res->desc       = IORES_DESC_NONE;
> > +
> > +       insert_resource(&iomem_resource, res);
> >  }
> >  
> >  /*
> > 
> > ... but failed miserably in terms of the kernel not booting because I
> > have no experience whatsoever in programming and debugging early kernel
> > init. But I am somewhat keen to ride the learning curve here. :)
> > 
> > Am I on the right track or were you a couple of leaps ahead of me
> > already and I just didn't get the question?
> 
> It seems a serious problem, the EFI modified memmap does not get an
> /proc/iomem resource update, but kexec_file relies on /proc/iomem in
> X86.
> 
> Can you try below diff see if it works for you? (not tested, and need
> explicitly 'add_efi_memmap' in kernel cmdline param)
> 
> There is an question from Sai about why add_efi_memmap is not enabled by
> default:
> https://www.spinics.net/lists/linux-mm/msg185166.html
> 
> Long time ago the add_efi_memmap is only enabled in case we explict
> enable it on cmdline, I'm not sure if we can do it by default, maybe we
> should.   Need opinion from X86 maintainers..
> 
> diff --git a/arch/x86/include/asm/efi.h b/arch/x86/include/asm/efi.h
> index 43a82e59c59d..eddaac6131cf 100644
> --- a/arch/x86/include/asm/efi.h
> +++ b/arch/x86/include/asm/efi.h
> @@ -243,6 +243,7 @@ static inline bool efi_is_64bit(void)
>  
>  extern bool efi_reboot_required(void);
>  extern bool efi_is_table_address(unsigned long phys_addr);
> +extern void do_add_efi_memmap(void);
>  
>  #else
>  static inline void parse_efi_setup(u64 phys_addr, u32 data_len) {}
> diff --git a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
> index 425e025341db..39e28ec76522 100644
> --- a/arch/x86/platform/efi/efi.c
> +++ b/arch/x86/platform/efi/efi.c
> @@ -149,10 +149,12 @@ void __init efi_find_mirror(void)
>   * (zeropage) memory map.
>   */
>  
> -static void __init do_add_efi_memmap(void)
> +void __init do_add_efi_memmap(void)
>  {
>  	efi_memory_desc_t *md;
>  
> +	if (!add_efi_memmap)
> +		return;
>  	for_each_efi_memory_desc(md) {
>  		unsigned long long start = md->phys_addr;
>  		unsigned long long size = md->num_pages << EFI_PAGE_SHIFT;
> @@ -224,8 +226,7 @@ int __init efi_memblock_x86_reserve_range(void)
>  	if (rv)
>  		return rv;
>  
> -	if (add_efi_memmap)
> -		do_add_efi_memmap();
> +	do_add_efi_memmap();
>  
>  	WARN(efi.memmap.desc_version != 1,
>  	     "Unexpected EFI_MEMORY_DESCRIPTOR version %ld",
> diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
> index 3b9fd679cea9..cfda591e51e3 100644
> --- a/arch/x86/platform/efi/quirks.c
> +++ b/arch/x86/platform/efi/quirks.c
> @@ -496,6 +496,7 @@ void __init efi_free_boot_services(void)
>  		pr_err("Could not install new EFI memmap\n");
>  		return;
>  	}
> +	do_add_efi_memmap();
>  }
>  
>  /*





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux