Re: [PATCH] efi/arm: fix allocation failure when reserving the kernel base

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



(adding Mike)

On Thu, 15 Aug 2019 at 14:28, Chester Lin <clin@xxxxxxxx> wrote:
>
> Hi Ard,
>
> On Thu, Aug 15, 2019 at 10:59:43AM +0300, Ard Biesheuvel wrote:
> > On Sun, 4 Aug 2019 at 10:57, Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote:
> > >
> > > Hello Chester,
> > >
> > > On Fri, 2 Aug 2019 at 08:40, Chester Lin <clin@xxxxxxxx> wrote:
> > > >
> > > > In some cases the arm32 efistub could fail to allocate memory for
> > > > uncompressed kernel. For example, we got the following error message when
> > > > verifying EFI stub on Raspberry Pi-2 [kernel-5.2.1 + grub-2.04] :
> > > >
> > > >   EFI stub: Booting Linux Kernel...
> > > >   EFI stub: ERROR: Unable to allocate memory for uncompressed kernel.
> > > >   EFI stub: ERROR: Failed to relocate kernel
> > > >
> > > > After checking the EFI memory map we found that the first page [0 - 0xfff]
> > > > had been reserved by Raspberry Pi-2's firmware, and the efistub tried to
> > > > set the dram base at 0, which was actually in a reserved region.
> > > >
> > >
> > > This by itself is a violation of the Linux boot protocol for 32-bit
> > > ARM when using the decompressor. The decompressor rounds down its own
> > > base address to a multiple of 128 MB, and assumes the whole area is
> > > available for the decompressed kernel and related data structures.
> > > (The first TEXT_OFFSET bytes are no longer used in practice, which is
> > > why putting a reserved region of 4 KB bytes works at the moment, but
> > > this is fragile). Note that the decompressor does not look at any DT
> > > or EFI provided memory maps *at all*.
> > >
> > > So unfortunately, this is not something we can fix in the kernel, but
> > > we should fix it in the bootloader or in GRUB, so it does not put any
> > > reserved regions in the first 128 MB of memory,
> > >
> >
> > OK, perhaps we can fix this by taking TEXT_OFFSET into account. The
> > ARM boot protocol docs are unclear about whether this memory should be
> > used or not, but it is no longer used for its original purpose (page
> > tables), and the RPi loader already keeps data there.
> >
> > Can you check whether the following patch works for you?
> >
> > diff --git a/drivers/firmware/efi/libstub/Makefile
> > b/drivers/firmware/efi/libstub/Makefile
> > index 0460c7581220..ee0661ddb25b 100644
> > --- a/drivers/firmware/efi/libstub/Makefile
> > +++ b/drivers/firmware/efi/libstub/Makefile
> > @@ -52,6 +52,7 @@ lib-$(CONFIG_EFI_ARMSTUB)     += arm-stub.o fdt.o
> > string.o random.o \
> >
> >  lib-$(CONFIG_ARM)              += arm32-stub.o
> >  lib-$(CONFIG_ARM64)            += arm64-stub.o
> > +CFLAGS_arm32-stub.o            := -DTEXT_OFFSET=$(TEXT_OFFSET)
> >  CFLAGS_arm64-stub.o            := -DTEXT_OFFSET=$(TEXT_OFFSET)
> >
> >  #
> > diff --git a/drivers/firmware/efi/libstub/arm32-stub.c
> > b/drivers/firmware/efi/libstub/arm32-stub.c
> > index e8f7aefb6813..66ff0c8ec269 100644
> > --- a/drivers/firmware/efi/libstub/arm32-stub.c
> > +++ b/drivers/firmware/efi/libstub/arm32-stub.c
> > @@ -204,7 +204,7 @@ efi_status_t
> > handle_kernel_image(efi_system_table_t *sys_table,
> >          * loaded. These assumptions are made by the decompressor,
> >          * before any memory map is available.
> >          */
> > -       dram_base = round_up(dram_base, SZ_128M);
> > +       dram_base = round_up(dram_base, SZ_128M) + TEXT_OFFSET;
> >
> >         status = reserve_kernel_base(sys_table, dram_base, reserve_addr,
> >                                      reserve_size);
> >
>
> I tried your patch on rpi2 and got the following panic. Just a reminder that I
> have replaced some log messages with "......" since it might be too long to
> post all.
>

OK. Good to know that this change helps you to get past the EFI stub boot issue.

> In this case the kernel failed to reserve cma, which should hit the issue of
> memblock_limit=0x1000 as I had mentioned in my patch description. The first
> block [0-0xfff] was scanned in adjust_lowmem_bounds(), but it did not align
> with PMD_SIZE so the cma reservation failed because the memblock.current_limit
> was extremely low. That's why I expand the first reservation from 1 PAGESIZE to
> 1 PMD_SIZE in my patch in order to avoid this issue. Please kindly let me know
> if any suggestion, thank you.
>

This looks like it is a separate issue. The memblock/cma code should
not choke on a reserved page of memory at 0x0.

Perhaps Russell or Mike (cc'ed) have an idea how to address this?



> boot-log:
> --------
>
> Loading Linux test ...
> EFI stub: Booting Linux Kernel...
> EFI stub: Using DTB from configuration table
> EFI stub: Exiting boot services and installing virtual address map...
> Uncompressing Linux... done, booting the kernel.
> [    0.000000] Booting Linux on physical CPU 0xf00
> [    0.000000] Linux version 5.2.1-lpae (chester@linux-8mug) (......)
> [    0.000000] CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=30c5387d
> [    0.000000] CPU: div instructions available: patching division code
> [    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache
> [    0.000000] OF: fdt: Machine model: Raspberry Pi 2 Model B Rev 1.1
> [    0.000000] printk: bootconsole [earlycon0] enabled
> [    0.000000] Memory policy: Data cache writealloc
> [    0.000000] efi: Getting EFI parameters from FDT:
> [    0.000000] efi:   System Table: 0x000000003df757c0
> [    0.000000] efi:   MemMap Address: 0x000000002c1c5040
> [    0.000000] efi:   MemMap Size: 0x000003c0
> [    0.000000] efi:   MemMap Desc. Size: 0x00000028
> [    0.000000] efi:   MemMap Desc. Version: 0x00000001
> [    0.000000] efi: EFI v2.70 by Das U-Boot
> [    0.000000] efi:  SMBIOS=0x3cb62000  MEMRESERVE=0x3cb3d040
> [    0.000000] memblock_reserve: [0x000000003cb3d040-0x000000003cb3d04f] efi_config_parse_tables+0x25c/0x2d8
> [    0.000000] efi: Processing EFI memory map:
> [    0.000000] MEMBLOCK configuration:
> [    0.000000]  memory size = 0x000000003e000000 reserved size = 0x0000000000000010
> [    0.000000]  memory.cnt  = 0x1
> [    0.000000]  memory[0x0]     [0x0000000000000000-0x000000003dffffff], 0x000000003e000000 bytes flags: 0x0
> [    0.000000]  reserved.cnt  = 0x1
> [    0.000000]  reserved[0x0]   [0x000000003cb3d040-0x000000003cb3d04f], 0x0000000000000010 bytes flags: 0x0
> [    0.000000] memblock_remove: [0x0000000000000000-0xfffffffffffffffe] reserve_regions+0x68/0x23c
> [    0.000000] efi:   0x000000000000-0x000000000fff [Reserved           |   |  |  |  |  |  |  |   |WB|  |  |  ]
> [    0.000000] memblock_add: [0x0000000000000000-0x0000000000000fff] early_init_dt_add_memory_arch+0x164/0x178
> [    0.000000] efi:   0x000000001000-0x000000307fff [Conventional Memory|   |  |  |  |  |  |  |   |WB|  |  |  ]
> [    0.000000] memblock_add: [0x0000000000001000-0x0000000000307fff] early_init_dt_add_memory_arch+0x164/0x178
> [    0.000000] efi:   0x000000308000-0x000002307fff [Boot Data          |   |  |  |  |  |  |  |   |WB|  |  |  ]
> [    0.000000] memblock_add: [0x0000000000308000-0x0000000002307fff] early_init_dt_add_memory_arch+0x164/0x178
> [    0.000000] efi:   0x000002308000-0x000002a93fff [Loader Data        |   |  |  |  |  |  |  |   |WB|  |  |  ]
> [    0.000000] memblock_add: [0x0000000002308000-0x0000000002a93fff] early_init_dt_add_memory_arch+0x164/0x178
> [    0.000000] efi:   0x000002a94000-0x000007cf5fff [Conventional Memory|   |  |  |  |  |  |  |   |WB|  |  |  ]
> [    0.000000] memblock_add: [0x0000000002a94000-0x0000000007cf5fff] early_init_dt_add_memory_arch+0x164/0x178
> ......
> ......
> [    0.000000] memblock_add: [0x000000003df76000-0x000000003dffffff] early_init_dt_add_memory_arch+0x164/0x178
> [    0.000000] efi:   0x00003f100000-0x00003f100fff [Memory Mapped I/O  |RUN|  |  |  |  |  |  |   |  |  |  |  ]
> [    0.000000] memblock_reserve: [0x000000002c1c5000-0x000000002c1c5fff] efi_init+0xd8/0x1c8
> [    0.000000] memblock_reserve: [0x0000000000400000-0x0000000001df2cef] arm_memblock_init+0x44/0x19c
> [    0.000000] memblock_reserve: [0x0000000000303000-0x0000000000307fff] arm_mm_memblock_reserve+0x30/0x38
> [    0.000000] memblock_reserve: [0x0000000007cf6000-0x0000000007cfc5c4] early_init_dt_reserve_memory_arch+0x2c/0x30
> [    0.000000] cma: Failed to reserve 64 MiB
> [    0.000000] MEMBLOCK configuration:
> [    0.000000]  memory size = 0x000000003e000000 reserved size = 0x00000000019ff2c5
> [    0.000000]  memory.cnt  = 0xa
> [    0.000000]  memory[0x0]     [0x0000000000000000-0x0000000000000fff], 0x0000000000001000 bytes flags: 0x4
> [    0.000000]  memory[0x1]     [0x0000000000001000-0x0000000007ef5fff], 0x0000000007ef5000 bytes flags: 0x0
> [    0.000000]  memory[0x2]     [0x0000000007ef6000-0x0000000007f09fff], 0x0000000000014000 bytes flags: 0x4
> [    0.000000]  memory[0x3]     [0x0000000007f0a000-0x000000003cb3efff], 0x0000000034c35000 bytes flags: 0x0
> [    0.000000]  memory[0x4]     [0x000000003cb3f000-0x000000003cb3ffff], 0x0000000000001000 bytes flags: 0x4
> [    0.000000]  memory[0x5]     [0x000000003cb40000-0x000000003cb5ffff], 0x0000000000020000 bytes flags: 0x0
> [    0.000000]  memory[0x6]     [0x000000003cb60000-0x000000003cb68fff], 0x0000000000009000 bytes flags: 0x4
> [    0.000000]  memory[0x7]     [0x000000003cb69000-0x000000003df74fff], 0x000000000140c000 bytes flags: 0x0
> [    0.000000]  memory[0x8]     [0x000000003df75000-0x000000003df75fff], 0x0000000000001000 bytes flags: 0x4
> [    0.000000]  memory[0x9]     [0x000000003df76000-0x000000003dffffff], 0x000000000008a000 bytes flags: 0x0
> [    0.000000]  reserved.cnt  = 0x5
> [    0.000000]  reserved[0x0]   [0x0000000000303000-0x0000000000307fff], 0x0000000000005000 bytes flags: 0x0
> [    0.000000]  reserved[0x1]   [0x0000000000400000-0x0000000001df2cef], 0x00000000019f2cf0 bytes flags: 0x0
> [    0.000000]  reserved[0x2]   [0x0000000007cf6000-0x0000000007cfc5c4], 0x00000000000065c5 bytes flags: 0x0
> [    0.000000]  reserved[0x3]   [0x000000002c1c5000-0x000000002c1c5fff], 0x0000000000001000 bytes flags: 0x0
> [    0.000000]  reserved[0x4]   [0x000000003cb3d040-0x000000003cb3d04f], 0x0000000000000010 bytes flags: 0x0
> [    0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 early_alloc+0x44/0x70
> [    0.000000] Kernel panic - not syncing: early_alloc: Failed to allocate 4096 bytes align=0x1000
> [    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.2.1-lpae #1 openSUSE Tumbleweed (unreleased)
> [    0.000000] Hardware name: BCM2835
> [    0.000000] Backtrace:
> [    0.000000] [<c043fafc>] (dump_backtrace) from [<c043fd84>] (show_stack+0x20/0x24)
> [    0.000000]  r7:c1800000 r6:00000000 r5:600001d3 r4:c1901ba0
> [    0.000000] [<c043fd64>] (show_stack) from [<c0df9400>] (dump_stack+0xd0/0x104)
> [    0.000000] [<c0df9330>] (dump_stack) from [<c048061c>] (panic+0xf8/0x32c)
> [    0.000000]  r10:c0307000 r9:c0001000 r8:00000003 r7:00000000 r6:00000000 r5:c181df04
> [    0.000000]  r4:c192b8d8 r3:00000001
> [    0.000000] [<c0480528>] (panic) from [<c1609728>] (early_alloc+0x60/0x70)
> [    0.000000]  r3:00001000 r2:00001000 r1:c10037e8 r0:c12fe64c
> [    0.000000]  r7:00000000
> [    0.000000] [<c16096c8>] (early_alloc) from [<c1609114>] (arm_pte_alloc+0x34/0x94)
> [    0.000000]  r7:00000000 r6:00000000 r4:c0307000
> [    0.000000] [<c16090e0>] (arm_pte_alloc) from [<c1609384>] (__create_mapping+0x210/0x2c0)
> [    0.000000]  r9:c0001000 r8:c0001000 r7:00000001 r6:c13f22e0 r5:c0200000 r4:c0400000
> [    0.000000] [<c1609174>] (__create_mapping) from [<c160951c>] (create_mapping+0xe8/0x108)
> [    0.000000]  r10:c0400000 r9:c16a2110 r8:c19c7a80 r7:00000000 r6:00400000 r5:c13f2000
> [    0.000000]  r4:c1801ef0
> [    0.000000] [<c1609434>] (create_mapping) from [<c1609f50>] (paging_init+0x350/0x75c)
> [    0.000000]  r4:c1842d40
>
>
> > >
> > > >   grub> lsefimmap
> > > >   Type      Physical start  - end             #Pages        Size Attributes
> > > >   reserved  0000000000000000-0000000000000fff 00000001      4KiB WB
> > > >   conv-mem  0000000000001000-0000000007ef5fff 00007ef5 130004KiB WB
> > > >   RT-data   0000000007ef6000-0000000007f09fff 00000014     80KiB RT WB
> > > >   conv-mem  0000000007f0a000-000000002d871fff 00025968 615840KiB WB
> > > >   .....
> > > >
> > > > To avoid a reserved address, we have to ignore the memory regions which are
> > > > marked as EFI_RESERVED_TYPE, and only conventional memory regions can be
> > > > chosen. If the region before the kernel base is unaligned, it will be
> > > > marked as EFI_RESERVED_TYPE and let kernel ignore it so that memblock_limit
> > > > will not be sticked with a very low address such as 0x1000.
> > > >
> >
> > This is a separate issue, so it should be handled in a separate patch.
> >
> > > > Signed-off-by: Chester Lin <clin@xxxxxxxx>
> > > > ---
> > > >  arch/arm/mm/mmu.c                         |  3 ++
> > > >  drivers/firmware/efi/libstub/arm32-stub.c | 43 ++++++++++++++++++-----
> > > >  2 files changed, 37 insertions(+), 9 deletions(-)
> > > >
> > > > diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c
> > > > index f3ce34113f89..909b11ba48d8 100644
> > > > --- a/arch/arm/mm/mmu.c
> > > > +++ b/arch/arm/mm/mmu.c
> > > > @@ -1184,6 +1184,9 @@ void __init adjust_lowmem_bounds(void)
> > > >                 phys_addr_t block_start = reg->base;
> > > >                 phys_addr_t block_end = reg->base + reg->size;
> > > >
> > > > +               if (memblock_is_nomap(reg))
> > > > +                       continue;
> > > > +
> > > >                 if (reg->base < vmalloc_limit) {
> > > >                         if (block_end > lowmem_limit)
> > > >                                 /*
> > > > diff --git a/drivers/firmware/efi/libstub/arm32-stub.c b/drivers/firmware/efi/libstub/arm32-stub.c
> > > > index e8f7aefb6813..10d33d36df00 100644
> > > > --- a/drivers/firmware/efi/libstub/arm32-stub.c
> > > > +++ b/drivers/firmware/efi/libstub/arm32-stub.c
> > > > @@ -128,7 +128,7 @@ static efi_status_t reserve_kernel_base(efi_system_table_t *sys_table_arg,
> > > >
> > > >         for (l = 0; l < map_size; l += desc_size) {
> > > >                 efi_memory_desc_t *desc;
> > > > -               u64 start, end;
> > > > +               u64 start, end, spare, kernel_base;
> > > >
> > > >                 desc = (void *)memory_map + l;
> > > >                 start = desc->phys_addr;
> > > > @@ -144,27 +144,52 @@ static efi_status_t reserve_kernel_base(efi_system_table_t *sys_table_arg,
> > > >                 case EFI_BOOT_SERVICES_DATA:
> > > >                         /* Ignore types that are released to the OS anyway */
> > > >                         continue;
> > > > -
> > > > +               case EFI_RESERVED_TYPE:
> > > > +                       /* Ignore reserved regions */
> > > > +                       continue;
> > > >                 case EFI_CONVENTIONAL_MEMORY:
> > > >                         /*
> > > >                          * Reserve the intersection between this entry and the
> > > >                          * region.
> > > >                          */
> > > >                         start = max(start, (u64)dram_base);
> > > > -                       end = min(end, (u64)dram_base + MAX_UNCOMP_KERNEL_SIZE);
> > > > +                       kernel_base = round_up(start, PMD_SIZE);
> > > > +                       spare = kernel_base - start;
> > > > +                       end = min(end, kernel_base + MAX_UNCOMP_KERNEL_SIZE);
> > > > +
> > > > +                       status = efi_call_early(allocate_pages,
> > > > +                                       EFI_ALLOCATE_ADDRESS,
> > > > +                                       EFI_LOADER_DATA,
> > > > +                                       MAX_UNCOMP_KERNEL_SIZE / EFI_PAGE_SIZE,
> > > > +                                       &kernel_base);
> > > > +                       if (status != EFI_SUCCESS) {
> > > > +                               pr_efi_err(sys_table_arg,
> > > > +                                       "reserve_kernel_base: alloc failed.\n");
> > > > +                               goto out;
> > > > +                       }
> > > > +                       *reserve_addr = kernel_base;
> > > >
> > > > +                       if (!spare)
> > > > +                               break;
> > > > +                       /*
> > > > +                        * If there's a gap between start and kernel_base,
> > > > +                        * it needs be reserved so that the memblock_limit
> > > > +                        * will not fall on a very low address when running
> > > > +                        * adjust_lowmem_bounds(), wchich could eventually
> > > > +                        * cause CMA reservation issue.
> > > > +                        */
> > > >                         status = efi_call_early(allocate_pages,
> > > >                                                 EFI_ALLOCATE_ADDRESS,
> > > > -                                               EFI_LOADER_DATA,
> > > > -                                               (end - start) / EFI_PAGE_SIZE,
> > > > +                                               EFI_RESERVED_TYPE,
> > > > +                                               spare / EFI_PAGE_SIZE,
> > > >                                                 &start);
> > > >                         if (status != EFI_SUCCESS) {
> > > >                                 pr_efi_err(sys_table_arg,
> > > > -                                       "reserve_kernel_base(): alloc failed.\n");
> > > > +                                       "reserve spare-region failed\n");
> > > >                                 goto out;
> > > >                         }
> > > > -                       break;
> > > >
> > > > +                       break;
> > > >                 case EFI_LOADER_CODE:
> > > >                 case EFI_LOADER_DATA:
> > > >                         /*
> > > > @@ -220,7 +245,7 @@ efi_status_t handle_kernel_image(efi_system_table_t *sys_table,
> > > >         *image_size = image->image_size;
> > > >         status = efi_relocate_kernel(sys_table, image_addr, *image_size,
> > > >                                      *image_size,
> > > > -                                    dram_base + MAX_UNCOMP_KERNEL_SIZE, 0);
> > > > +                                    *reserve_addr + MAX_UNCOMP_KERNEL_SIZE, 0);
> > > >         if (status != EFI_SUCCESS) {
> > > >                 pr_efi_err(sys_table, "Failed to relocate kernel.\n");
> > > >                 efi_free(sys_table, *reserve_size, *reserve_addr);
> > > > @@ -233,7 +258,7 @@ efi_status_t handle_kernel_image(efi_system_table_t *sys_table,
> > > >          * in memory. The kernel determines the base of DRAM from the
> > > >          * address at which the zImage is loaded.
> > > >          */
> > > > -       if (*image_addr + *image_size > dram_base + ZIMAGE_OFFSET_LIMIT) {
> > > > +       if (*image_addr + *image_size > *reserve_addr + ZIMAGE_OFFSET_LIMIT) {
> > > >                 pr_efi_err(sys_table, "Failed to relocate kernel, no low memory available.\n");
> > > >                 efi_free(sys_table, *reserve_size, *reserve_addr);
> > > >                 *reserve_size = 0;
> > > > --
> > > > 2.22.0
> > > >
> >



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux