On Thu, Aug 15, 2019 at 02:32:50PM +0300, Ard Biesheuvel wrote: > (adding Mike) > > On Thu, 15 Aug 2019 at 14:28, Chester Lin <clin@xxxxxxxx> wrote: > > > > Hi Ard, > > > > On Thu, Aug 15, 2019 at 10:59:43AM +0300, Ard Biesheuvel wrote: > > > On Sun, 4 Aug 2019 at 10:57, Ard Biesheuvel <ard.biesheuvel@xxxxxxxxxx> wrote: > > > > > > > > Hello Chester, > > > > > > > > On Fri, 2 Aug 2019 at 08:40, Chester Lin <clin@xxxxxxxx> wrote: > > > > > > > > > > In some cases the arm32 efistub could fail to allocate memory for > > > > > uncompressed kernel. For example, we got the following error message when > > > > > verifying EFI stub on Raspberry Pi-2 [kernel-5.2.1 + grub-2.04] : > > > > > > > > > > EFI stub: Booting Linux Kernel... > > > > > EFI stub: ERROR: Unable to allocate memory for uncompressed kernel. > > > > > EFI stub: ERROR: Failed to relocate kernel > > > > > > > > > > After checking the EFI memory map we found that the first page [0 - 0xfff] > > > > > had been reserved by Raspberry Pi-2's firmware, and the efistub tried to > > > > > set the dram base at 0, which was actually in a reserved region. > > > > > > > > > > > > > This by itself is a violation of the Linux boot protocol for 32-bit > > > > ARM when using the decompressor. The decompressor rounds down its own > > > > base address to a multiple of 128 MB, and assumes the whole area is > > > > available for the decompressed kernel and related data structures. > > > > (The first TEXT_OFFSET bytes are no longer used in practice, which is > > > > why putting a reserved region of 4 KB bytes works at the moment, but > > > > this is fragile). Note that the decompressor does not look at any DT > > > > or EFI provided memory maps *at all*. > > > > > > > > So unfortunately, this is not something we can fix in the kernel, but > > > > we should fix it in the bootloader or in GRUB, so it does not put any > > > > reserved regions in the first 128 MB of memory, > > > > > > > > > > OK, perhaps we can fix this by taking TEXT_OFFSET into account. The > > > ARM boot protocol docs are unclear about whether this memory should be > > > used or not, but it is no longer used for its original purpose (page > > > tables), and the RPi loader already keeps data there. > > > > > > Can you check whether the following patch works for you? > > > > > > diff --git a/drivers/firmware/efi/libstub/Makefile > > > b/drivers/firmware/efi/libstub/Makefile > > > index 0460c7581220..ee0661ddb25b 100644 > > > --- a/drivers/firmware/efi/libstub/Makefile > > > +++ b/drivers/firmware/efi/libstub/Makefile > > > @@ -52,6 +52,7 @@ lib-$(CONFIG_EFI_ARMSTUB) += arm-stub.o fdt.o > > > string.o random.o \ > > > > > > lib-$(CONFIG_ARM) += arm32-stub.o > > > lib-$(CONFIG_ARM64) += arm64-stub.o > > > +CFLAGS_arm32-stub.o := -DTEXT_OFFSET=$(TEXT_OFFSET) > > > CFLAGS_arm64-stub.o := -DTEXT_OFFSET=$(TEXT_OFFSET) > > > > > > # > > > diff --git a/drivers/firmware/efi/libstub/arm32-stub.c > > > b/drivers/firmware/efi/libstub/arm32-stub.c > > > index e8f7aefb6813..66ff0c8ec269 100644 > > > --- a/drivers/firmware/efi/libstub/arm32-stub.c > > > +++ b/drivers/firmware/efi/libstub/arm32-stub.c > > > @@ -204,7 +204,7 @@ efi_status_t > > > handle_kernel_image(efi_system_table_t *sys_table, > > > * loaded. These assumptions are made by the decompressor, > > > * before any memory map is available. > > > */ > > > - dram_base = round_up(dram_base, SZ_128M); > > > + dram_base = round_up(dram_base, SZ_128M) + TEXT_OFFSET; > > > > > > status = reserve_kernel_base(sys_table, dram_base, reserve_addr, > > > reserve_size); > > > > > > > I tried your patch on rpi2 and got the following panic. Just a reminder that I > > have replaced some log messages with "......" since it might be too long to > > post all. > > > > OK. Good to know that this change helps you to get past the EFI stub boot issue. > > > In this case the kernel failed to reserve cma, which should hit the issue of > > memblock_limit=0x1000 as I had mentioned in my patch description. The first > > block [0-0xfff] was scanned in adjust_lowmem_bounds(), but it did not align > > with PMD_SIZE so the cma reservation failed because the memblock.current_limit > > was extremely low. That's why I expand the first reservation from 1 PAGESIZE to > > 1 PMD_SIZE in my patch in order to avoid this issue. Please kindly let me know > > if any suggestion, thank you. > This looks like it is a separate issue. The memblock/cma code should > not choke on a reserved page of memory at 0x0. > > Perhaps Russell or Mike (cc'ed) have an idea how to address this? Presuming that the last memblock dump comes from the end of arm_memblock_init() with the this memory map memory[0x0] [0x0000000000000000-0x0000000000000fff], 0x0000000000001000 bytes flags: 0x4 memory[0x1] [0x0000000000001000-0x0000000007ef5fff], 0x0000000007ef5000 bytes flags: 0x0 memory[0x2] [0x0000000007ef6000-0x0000000007f09fff], 0x0000000000014000 bytes flags: 0x4 memory[0x3] [0x0000000007f0a000-0x000000003cb3efff], 0x0000000034c35000 bytes flags: 0x0 adjust_lowmem_bounds() will set the memblock_limit (and respectively global memblock.current_limit) to 0x1000 and any further memblock_alloc*() will happily fail. I believe that the assumption for memblock_limit calculations was that the first bank has several megs at least. I wonder if this hack would help: diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c index d9a0038..948e5b9 100644 --- a/arch/arm/mm/mmu.c +++ b/arch/arm/mm/mmu.c @@ -1206,7 +1206,7 @@ void __init adjust_lowmem_bounds(void) * allocated when mapping the start of bank 0, which * occurs before any free memory is mapped. */ - if (!memblock_limit) { + if (memblock_limit < PMD_SIZE) { if (!IS_ALIGNED(block_start, PMD_SIZE)) memblock_limit = block_start; else if (!IS_ALIGNED(block_end, PMD_SIZE)) > > boot-log: > > -------- > > > > Loading Linux test ... > > EFI stub: Booting Linux Kernel... > > EFI stub: Using DTB from configuration table > > EFI stub: Exiting boot services and installing virtual address map... > > Uncompressing Linux... done, booting the kernel. > > [ 0.000000] Booting Linux on physical CPU 0xf00 > > [ 0.000000] Linux version 5.2.1-lpae (chester@linux-8mug) (......) > > [ 0.000000] CPU: ARMv7 Processor [410fc075] revision 5 (ARMv7), cr=30c5387d > > [ 0.000000] CPU: div instructions available: patching division code > > [ 0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache > > [ 0.000000] OF: fdt: Machine model: Raspberry Pi 2 Model B Rev 1.1 > > [ 0.000000] printk: bootconsole [earlycon0] enabled > > [ 0.000000] Memory policy: Data cache writealloc > > [ 0.000000] efi: Getting EFI parameters from FDT: > > [ 0.000000] efi: System Table: 0x000000003df757c0 > > [ 0.000000] efi: MemMap Address: 0x000000002c1c5040 > > [ 0.000000] efi: MemMap Size: 0x000003c0 > > [ 0.000000] efi: MemMap Desc. Size: 0x00000028 > > [ 0.000000] efi: MemMap Desc. Version: 0x00000001 > > [ 0.000000] efi: EFI v2.70 by Das U-Boot > > [ 0.000000] efi: SMBIOS=0x3cb62000 MEMRESERVE=0x3cb3d040 > > [ 0.000000] memblock_reserve: [0x000000003cb3d040-0x000000003cb3d04f] efi_config_parse_tables+0x25c/0x2d8 > > [ 0.000000] efi: Processing EFI memory map: > > [ 0.000000] MEMBLOCK configuration: > > [ 0.000000] memory size = 0x000000003e000000 reserved size = 0x0000000000000010 > > [ 0.000000] memory.cnt = 0x1 > > [ 0.000000] memory[0x0] [0x0000000000000000-0x000000003dffffff], 0x000000003e000000 bytes flags: 0x0 > > [ 0.000000] reserved.cnt = 0x1 > > [ 0.000000] reserved[0x0] [0x000000003cb3d040-0x000000003cb3d04f], 0x0000000000000010 bytes flags: 0x0 > > [ 0.000000] memblock_remove: [0x0000000000000000-0xfffffffffffffffe] reserve_regions+0x68/0x23c > > [ 0.000000] efi: 0x000000000000-0x000000000fff [Reserved | | | | | | | | |WB| | | ] > > [ 0.000000] memblock_add: [0x0000000000000000-0x0000000000000fff] early_init_dt_add_memory_arch+0x164/0x178 > > [ 0.000000] efi: 0x000000001000-0x000000307fff [Conventional Memory| | | | | | | | |WB| | | ] > > [ 0.000000] memblock_add: [0x0000000000001000-0x0000000000307fff] early_init_dt_add_memory_arch+0x164/0x178 > > [ 0.000000] efi: 0x000000308000-0x000002307fff [Boot Data | | | | | | | | |WB| | | ] > > [ 0.000000] memblock_add: [0x0000000000308000-0x0000000002307fff] early_init_dt_add_memory_arch+0x164/0x178 > > [ 0.000000] efi: 0x000002308000-0x000002a93fff [Loader Data | | | | | | | | |WB| | | ] > > [ 0.000000] memblock_add: [0x0000000002308000-0x0000000002a93fff] early_init_dt_add_memory_arch+0x164/0x178 > > [ 0.000000] efi: 0x000002a94000-0x000007cf5fff [Conventional Memory| | | | | | | | |WB| | | ] > > [ 0.000000] memblock_add: [0x0000000002a94000-0x0000000007cf5fff] early_init_dt_add_memory_arch+0x164/0x178 > > ...... > > ...... > > [ 0.000000] memblock_add: [0x000000003df76000-0x000000003dffffff] early_init_dt_add_memory_arch+0x164/0x178 > > [ 0.000000] efi: 0x00003f100000-0x00003f100fff [Memory Mapped I/O |RUN| | | | | | | | | | | ] > > [ 0.000000] memblock_reserve: [0x000000002c1c5000-0x000000002c1c5fff] efi_init+0xd8/0x1c8 > > [ 0.000000] memblock_reserve: [0x0000000000400000-0x0000000001df2cef] arm_memblock_init+0x44/0x19c > > [ 0.000000] memblock_reserve: [0x0000000000303000-0x0000000000307fff] arm_mm_memblock_reserve+0x30/0x38 > > [ 0.000000] memblock_reserve: [0x0000000007cf6000-0x0000000007cfc5c4] early_init_dt_reserve_memory_arch+0x2c/0x30 > > [ 0.000000] cma: Failed to reserve 64 MiB > > [ 0.000000] MEMBLOCK configuration: > > [ 0.000000] memory size = 0x000000003e000000 reserved size = 0x00000000019ff2c5 > > [ 0.000000] memory.cnt = 0xa > > [ 0.000000] memory[0x0] [0x0000000000000000-0x0000000000000fff], 0x0000000000001000 bytes flags: 0x4 > > [ 0.000000] memory[0x1] [0x0000000000001000-0x0000000007ef5fff], 0x0000000007ef5000 bytes flags: 0x0 > > [ 0.000000] memory[0x2] [0x0000000007ef6000-0x0000000007f09fff], 0x0000000000014000 bytes flags: 0x4 > > [ 0.000000] memory[0x3] [0x0000000007f0a000-0x000000003cb3efff], 0x0000000034c35000 bytes flags: 0x0 > > [ 0.000000] memory[0x4] [0x000000003cb3f000-0x000000003cb3ffff], 0x0000000000001000 bytes flags: 0x4 > > [ 0.000000] memory[0x5] [0x000000003cb40000-0x000000003cb5ffff], 0x0000000000020000 bytes flags: 0x0 > > [ 0.000000] memory[0x6] [0x000000003cb60000-0x000000003cb68fff], 0x0000000000009000 bytes flags: 0x4 > > [ 0.000000] memory[0x7] [0x000000003cb69000-0x000000003df74fff], 0x000000000140c000 bytes flags: 0x0 > > [ 0.000000] memory[0x8] [0x000000003df75000-0x000000003df75fff], 0x0000000000001000 bytes flags: 0x4 > > [ 0.000000] memory[0x9] [0x000000003df76000-0x000000003dffffff], 0x000000000008a000 bytes flags: 0x0 > > [ 0.000000] reserved.cnt = 0x5 > > [ 0.000000] reserved[0x0] [0x0000000000303000-0x0000000000307fff], 0x0000000000005000 bytes flags: 0x0 > > [ 0.000000] reserved[0x1] [0x0000000000400000-0x0000000001df2cef], 0x00000000019f2cf0 bytes flags: 0x0 > > [ 0.000000] reserved[0x2] [0x0000000007cf6000-0x0000000007cfc5c4], 0x00000000000065c5 bytes flags: 0x0 > > [ 0.000000] reserved[0x3] [0x000000002c1c5000-0x000000002c1c5fff], 0x0000000000001000 bytes flags: 0x0 > > [ 0.000000] reserved[0x4] [0x000000003cb3d040-0x000000003cb3d04f], 0x0000000000000010 bytes flags: 0x0 > > [ 0.000000] memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x0000000000000000 max_addr=0x0000000000000000 early_alloc+0x44/0x70 > > [ 0.000000] Kernel panic - not syncing: early_alloc: Failed to allocate 4096 bytes align=0x1000 > > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.2.1-lpae #1 openSUSE Tumbleweed (unreleased) > > [ 0.000000] Hardware name: BCM2835 > > [ 0.000000] Backtrace: > > [ 0.000000] [<c043fafc>] (dump_backtrace) from [<c043fd84>] (show_stack+0x20/0x24) > > [ 0.000000] r7:c1800000 r6:00000000 r5:600001d3 r4:c1901ba0 > > [ 0.000000] [<c043fd64>] (show_stack) from [<c0df9400>] (dump_stack+0xd0/0x104) > > [ 0.000000] [<c0df9330>] (dump_stack) from [<c048061c>] (panic+0xf8/0x32c) > > [ 0.000000] r10:c0307000 r9:c0001000 r8:00000003 r7:00000000 r6:00000000 r5:c181df04 > > [ 0.000000] r4:c192b8d8 r3:00000001 > > [ 0.000000] [<c0480528>] (panic) from [<c1609728>] (early_alloc+0x60/0x70) > > [ 0.000000] r3:00001000 r2:00001000 r1:c10037e8 r0:c12fe64c > > [ 0.000000] r7:00000000 > > [ 0.000000] [<c16096c8>] (early_alloc) from [<c1609114>] (arm_pte_alloc+0x34/0x94) > > [ 0.000000] r7:00000000 r6:00000000 r4:c0307000 > > [ 0.000000] [<c16090e0>] (arm_pte_alloc) from [<c1609384>] (__create_mapping+0x210/0x2c0) > > [ 0.000000] r9:c0001000 r8:c0001000 r7:00000001 r6:c13f22e0 r5:c0200000 r4:c0400000 > > [ 0.000000] [<c1609174>] (__create_mapping) from [<c160951c>] (create_mapping+0xe8/0x108) > > [ 0.000000] r10:c0400000 r9:c16a2110 r8:c19c7a80 r7:00000000 r6:00400000 r5:c13f2000 > > [ 0.000000] r4:c1801ef0 > > [ 0.000000] [<c1609434>] (create_mapping) from [<c1609f50>] (paging_init+0x350/0x75c) > > [ 0.000000] r4:c1842d40 > > > > > > > > > > > > > grub> lsefimmap > > > > > Type Physical start - end #Pages Size Attributes > > > > > reserved 0000000000000000-0000000000000fff 00000001 4KiB WB > > > > > conv-mem 0000000000001000-0000000007ef5fff 00007ef5 130004KiB WB > > > > > RT-data 0000000007ef6000-0000000007f09fff 00000014 80KiB RT WB > > > > > conv-mem 0000000007f0a000-000000002d871fff 00025968 615840KiB WB > > > > > ..... > > > > > > > > > > To avoid a reserved address, we have to ignore the memory regions which are > > > > > marked as EFI_RESERVED_TYPE, and only conventional memory regions can be > > > > > chosen. If the region before the kernel base is unaligned, it will be > > > > > marked as EFI_RESERVED_TYPE and let kernel ignore it so that memblock_limit > > > > > will not be sticked with a very low address such as 0x1000. > > > > > > > > > > > This is a separate issue, so it should be handled in a separate patch. > > > > > > > > Signed-off-by: Chester Lin <clin@xxxxxxxx> > > > > > --- > > > > > arch/arm/mm/mmu.c | 3 ++ > > > > > drivers/firmware/efi/libstub/arm32-stub.c | 43 ++++++++++++++++++----- > > > > > 2 files changed, 37 insertions(+), 9 deletions(-) > > > > > > > > > > diff --git a/arch/arm/mm/mmu.c b/arch/arm/mm/mmu.c > > > > > index f3ce34113f89..909b11ba48d8 100644 > > > > > --- a/arch/arm/mm/mmu.c > > > > > +++ b/arch/arm/mm/mmu.c > > > > > @@ -1184,6 +1184,9 @@ void __init adjust_lowmem_bounds(void) > > > > > phys_addr_t block_start = reg->base; > > > > > phys_addr_t block_end = reg->base + reg->size; > > > > > > > > > > + if (memblock_is_nomap(reg)) > > > > > + continue; > > > > > + > > > > > if (reg->base < vmalloc_limit) { > > > > > if (block_end > lowmem_limit) > > > > > /* > > > > > diff --git a/drivers/firmware/efi/libstub/arm32-stub.c b/drivers/firmware/efi/libstub/arm32-stub.c > > > > > index e8f7aefb6813..10d33d36df00 100644 > > > > > --- a/drivers/firmware/efi/libstub/arm32-stub.c > > > > > +++ b/drivers/firmware/efi/libstub/arm32-stub.c > > > > > @@ -128,7 +128,7 @@ static efi_status_t reserve_kernel_base(efi_system_table_t *sys_table_arg, > > > > > > > > > > for (l = 0; l < map_size; l += desc_size) { > > > > > efi_memory_desc_t *desc; > > > > > - u64 start, end; > > > > > + u64 start, end, spare, kernel_base; > > > > > > > > > > desc = (void *)memory_map + l; > > > > > start = desc->phys_addr; > > > > > @@ -144,27 +144,52 @@ static efi_status_t reserve_kernel_base(efi_system_table_t *sys_table_arg, > > > > > case EFI_BOOT_SERVICES_DATA: > > > > > /* Ignore types that are released to the OS anyway */ > > > > > continue; > > > > > - > > > > > + case EFI_RESERVED_TYPE: > > > > > + /* Ignore reserved regions */ > > > > > + continue; > > > > > case EFI_CONVENTIONAL_MEMORY: > > > > > /* > > > > > * Reserve the intersection between this entry and the > > > > > * region. > > > > > */ > > > > > start = max(start, (u64)dram_base); > > > > > - end = min(end, (u64)dram_base + MAX_UNCOMP_KERNEL_SIZE); > > > > > + kernel_base = round_up(start, PMD_SIZE); > > > > > + spare = kernel_base - start; > > > > > + end = min(end, kernel_base + MAX_UNCOMP_KERNEL_SIZE); > > > > > + > > > > > + status = efi_call_early(allocate_pages, > > > > > + EFI_ALLOCATE_ADDRESS, > > > > > + EFI_LOADER_DATA, > > > > > + MAX_UNCOMP_KERNEL_SIZE / EFI_PAGE_SIZE, > > > > > + &kernel_base); > > > > > + if (status != EFI_SUCCESS) { > > > > > + pr_efi_err(sys_table_arg, > > > > > + "reserve_kernel_base: alloc failed.\n"); > > > > > + goto out; > > > > > + } > > > > > + *reserve_addr = kernel_base; > > > > > > > > > > + if (!spare) > > > > > + break; > > > > > + /* > > > > > + * If there's a gap between start and kernel_base, > > > > > + * it needs be reserved so that the memblock_limit > > > > > + * will not fall on a very low address when running > > > > > + * adjust_lowmem_bounds(), wchich could eventually > > > > > + * cause CMA reservation issue. > > > > > + */ > > > > > status = efi_call_early(allocate_pages, > > > > > EFI_ALLOCATE_ADDRESS, > > > > > - EFI_LOADER_DATA, > > > > > - (end - start) / EFI_PAGE_SIZE, > > > > > + EFI_RESERVED_TYPE, > > > > > + spare / EFI_PAGE_SIZE, > > > > > &start); > > > > > if (status != EFI_SUCCESS) { > > > > > pr_efi_err(sys_table_arg, > > > > > - "reserve_kernel_base(): alloc failed.\n"); > > > > > + "reserve spare-region failed\n"); > > > > > goto out; > > > > > } > > > > > - break; > > > > > > > > > > + break; > > > > > case EFI_LOADER_CODE: > > > > > case EFI_LOADER_DATA: > > > > > /* > > > > > @@ -220,7 +245,7 @@ efi_status_t handle_kernel_image(efi_system_table_t *sys_table, > > > > > *image_size = image->image_size; > > > > > status = efi_relocate_kernel(sys_table, image_addr, *image_size, > > > > > *image_size, > > > > > - dram_base + MAX_UNCOMP_KERNEL_SIZE, 0); > > > > > + *reserve_addr + MAX_UNCOMP_KERNEL_SIZE, 0); > > > > > if (status != EFI_SUCCESS) { > > > > > pr_efi_err(sys_table, "Failed to relocate kernel.\n"); > > > > > efi_free(sys_table, *reserve_size, *reserve_addr); > > > > > @@ -233,7 +258,7 @@ efi_status_t handle_kernel_image(efi_system_table_t *sys_table, > > > > > * in memory. The kernel determines the base of DRAM from the > > > > > * address at which the zImage is loaded. > > > > > */ > > > > > - if (*image_addr + *image_size > dram_base + ZIMAGE_OFFSET_LIMIT) { > > > > > + if (*image_addr + *image_size > *reserve_addr + ZIMAGE_OFFSET_LIMIT) { > > > > > pr_efi_err(sys_table, "Failed to relocate kernel, no low memory available.\n"); > > > > > efi_free(sys_table, *reserve_size, *reserve_addr); > > > > > *reserve_size = 0; > > > > > -- > > > > > 2.22.0 > > > > > > > > > -- Sincerely yours, Mike.