Hi Ard, Thanks for your reply. Here is the problem I encountered, and the problem still exists in the mainline: Firmware: UEFI 2.4, EFI loader: systemd-boot When I update my kernel from LTS v6.1.x to v6.6.x, the kernel fails to boot when it jumps from the handover protocol. And it fails only on an old Firmware that MemoryOverwriteRequestControl(MOR) bit is set to 1. According to the TCG Reset Attack Mitigation Spec, a Memory Clear Method will be loaded in the memory on boot, and this causes the EFI loader and OS memory address to shift. Another important factor is that my BSS region and boot_idt entries are not as clean as those in QEMU and other platforms. This means that clearing BSS in the handover function can cause the firmware's IDT/GDT corruption, as you mentioned here commit <7b8439b0369b> (x86/efistub: Drop redundant clearing of BSS). I noticed that most of variables in BSS will be initialized before accessing, and only a few are not, that's why I removed memset part in handover entry point. In terms of the BOOT_STACK_SIZE, I found out that the boot_stack region is used in this case (at the beginning of _bss), and 0x4000 is not sufficient to cover this issue since the data in boot_heap will be overwritten during decompression. Prior to image decompression in EFI stub, it was ok. 00000000008fc000 g .bss 0000000000000000 .hidden _bss 00000000008fc000 l O .bss 0000000000004000 boot_stack 0000000000900000 g O .bss 0000000000000004 .hidden spurious_nmi_count 0000000000900000 l O .bss 0000000000000000 boot_stack_end 0000000000900010 g O .bss 0000000000000018 .hidden pio_ops 0000000000900028 g O .bss 0000000000000008 .hidden boot_params_ptr 0000000000900040 l O .bss 0000000000001000 scratch.0 0000000000901040 l O .bss 0000000000010000 boot_heap My thought was that increasing the size of boot_stack should be harmless. As for the memset part, there could be an alternative way, althought it looks a bit ugly. memset(_bss+0x10000, 0, _ebss - _bss - 0x10000) However, I've updated the diff, if you think this is more like a workaround, please take this thread as a bug report. Thanks! Regards, Marshall Shao Signed-off-by: Marshall Shao <marshall.shao@xxxxxxxx> --- arch/x86/boot/compressed/misc.c | 4 ++-- arch/x86/include/asm/boot.h | 2 +- drivers/firmware/efi/libstub/x86-5lvl.c | 2 +- drivers/firmware/efi/libstub/x86-stub.c | 2 -- include/linux/decompress/mm.h | 2 +- 5 files changed, 5 insertions(+), 7 deletions(-) diff --git a/arch/x86/boot/compressed/misc.c b/arch/x86/boot/compressed/misc.c index 944454306ef4..49b68f57dd18 100644 --- a/arch/x86/boot/compressed/misc.c +++ b/arch/x86/boot/compressed/misc.c @@ -50,8 +50,8 @@ struct boot_params *boot_params_ptr; struct port_io_ops pio_ops; -memptr free_mem_ptr; -memptr free_mem_end_ptr; +memptr free_mem_ptr __section(".data"); +memptr free_mem_end_ptr __section(".data"); int spurious_nmi_count; static char *vidmem; diff --git a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h index 3e5b111e619d..312bc87ab027 100644 --- a/arch/x86/include/asm/boot.h +++ b/arch/x86/include/asm/boot.h @@ -33,7 +33,7 @@ #endif #ifdef CONFIG_X86_64 -# define BOOT_STACK_SIZE 0x4000 +# define BOOT_STACK_SIZE 0x10000 /* * Used by decompressor's startup_32() to allocate page tables for identity diff --git a/drivers/firmware/efi/libstub/x86-5lvl.c b/drivers/firmware/efi/libstub/x86-5lvl.c index 77359e802181..bebae4fdfb93 100644 --- a/drivers/firmware/efi/libstub/x86-5lvl.c +++ b/drivers/firmware/efi/libstub/x86-5lvl.c @@ -10,7 +10,7 @@ bool efi_no5lvl; -static void (*la57_toggle)(void *cr3); +static void (*la57_toggle)(void *cr3) __section(".data"); static const struct desc_struct gdt[] = { [GDT_ENTRY_KERNEL32_CS] = GDT_ENTRY_INIT(DESC_CODE32, 0, 0xfffff), diff --git a/drivers/firmware/efi/libstub/x86-stub.c b/drivers/firmware/efi/libstub/x86-stub.c index 078055b054e3..ffc62af50669 100644 --- a/drivers/firmware/efi/libstub/x86-stub.c +++ b/drivers/firmware/efi/libstub/x86-stub.c @@ -21,7 +21,6 @@ #include "efistub.h" #include "x86-stub.h" -extern char _bss[], _ebss[]; const efi_system_table_t *efi_system_table; const efi_dxe_services_table_t *efi_dxe_table; @@ -1059,7 +1058,6 @@ void __noreturn efi_stub_entry(efi_handle_t handle, void efi_handover_entry(efi_handle_t handle, efi_system_table_t *sys_table_arg, struct boot_params *boot_params) { - memset(_bss, 0, _ebss - _bss); efi_stub_entry(handle, sys_table_arg, boot_params); } diff --git a/include/linux/decompress/mm.h b/include/linux/decompress/mm.h index ac862422df15..62c04691c898 100644 --- a/include/linux/decompress/mm.h +++ b/include/linux/decompress/mm.h @@ -36,7 +36,7 @@ /* A trivial malloc implementation, adapted from * malloc by Hannu Savolainen 1993 and Matthias Urlichs 1994 */ -STATIC_RW_DATA unsigned long malloc_ptr; +STATIC_RW_DATA unsigned long malloc_ptr __section(".data"); STATIC_RW_DATA int malloc_count; MALLOC_VISIBLE void *malloc(int size) -- 2.34.1 -----Original Message----- From: Ard Biesheuvel <ardb@xxxxxxxxxx> Sent: Wednesday, July 17, 2024 11:27 PM To: Shao, Marshall <Marshall.Shao@xxxxxxxx> Cc: linux-efi@xxxxxxxxxxxxxxx; linux-kernel@xxxxxxxxxxxxxxx; hpa@xxxxxxxxx; dave.hansen@xxxxxxxxxxxxxxx; bp@xxxxxxxxx; mingo@xxxxxxxxxx; tglx@xxxxxxxxxxxxx Subject: Re: [Patch] Do not clear BSS region in x86 stub [EXTERNAL EMAIL] On Wed, 17 Jul 2024 at 08:17, Shao, Marshall <Marshall.Shao@xxxxxxxx> wrote: > > Clearing the BSS region may cause the UEFI firmware to malfunction > during boot. > > When booting the kernel from an older firmware version that has TPM > enabled and the MemoryOverwriteRequestControl bit set to 1, the > firmware's boot service might encounter an exception if it attempts to > initialize the BSS region within the x86 stub. > > To circumvent the firmware exception, it is advisable to enlarge the > BOOT_STACK_SIZE and to perform the initialization of static variables > prior to the decompression of the bzImage. > What does 'advisable' mean? This patch looks like a few hacks thrown together that happen to work around some issue in combination. Please explain what the exact problem is, and how this patch fixes it. If you don't know what the exact problem is, please say so. > Signed-off-by: Marshall Shao <marshall.shao@xxxxxxxx> > --- > arch/x86/boot/compressed/misc.c | 8 +++----- > arch/x86/include/asm/boot.h | 2 +- > drivers/firmware/efi/libstub/x86-stub.c | 5 ----- > 3 files changed, 4 insertions(+), 11 deletions(-) > > diff --git a/arch/x86/boot/compressed/misc.c > b/arch/x86/boot/compressed/misc.c index b70e4a21c15f..bac5a3c55c2c > 100644 > --- a/arch/x86/boot/compressed/misc.c > +++ b/arch/x86/boot/compressed/misc.c > @@ -356,11 +356,9 @@ unsigned long decompress_kernel(unsigned char *outbuf, unsigned long virt_addr, > void (*error)(char *x)) { > unsigned long entry; > - > - if (!free_mem_ptr) { > - free_mem_ptr = (unsigned long)boot_heap; > - free_mem_end_ptr = (unsigned long)boot_heap + sizeof(boot_heap); > - } > + free_mem_ptr = (unsigned long)boot_heap; > + free_mem_end_ptr = (unsigned long)boot_heap + sizeof(boot_heap); > + malloc_ptr = free_mem_ptr; > This is not safe. This reinitializes the heap after allocations have been made already. It would be better to annotate free_mem_ptr as __section(".data") to ensure that it is zeroed even if BSS is not cleared. > if (__decompress(input_data, input_len, NULL, NULL, outbuf, output_len, > NULL, error) < 0) diff --git > a/arch/x86/include/asm/boot.h b/arch/x86/include/asm/boot.h index > 3e5b111e619d..312bc87ab027 100644 > --- a/arch/x86/include/asm/boot.h > +++ b/arch/x86/include/asm/boot.h > @@ -33,7 +33,7 @@ > #endif > > #ifdef CONFIG_X86_64 > -# define BOOT_STACK_SIZE 0x4000 > +# define BOOT_STACK_SIZE 0x10000 > Why is this necessary? EFI boot does not use this stack, only legacy boot via the decompressor. > /* > * Used by decompressor's startup_32() to allocate page tables for > identity diff --git a/drivers/firmware/efi/libstub/x86-stub.c > b/drivers/firmware/efi/libstub/x86-stub.c > index 1983fd3bf392..d92d2ccc709b 100644 > --- a/drivers/firmware/efi/libstub/x86-stub.c > +++ b/drivers/firmware/efi/libstub/x86-stub.c > @@ -21,7 +21,6 @@ > #include "efistub.h" > #include "x86-stub.h" > > -extern char _bss[], _ebss[]; > > const efi_system_table_t *efi_system_table; const > efi_dxe_services_table_t *efi_dxe_table; @@ -476,9 +475,6 @@ > efi_status_t __efiapi efi_pe_entry(efi_handle_t handle, > efi_status_t status; > char *cmdline_ptr; > > - if (efi_is_native()) > - memset(_bss, 0, _ebss - _bss); > - This bit has already been removed in mainline. > efi_system_table = sys_table_arg; > > /* Check if we were booted by the EFI firmware */ @@ -1000,7 > +996,6 @@ void __noreturn efi_stub_entry(efi_handle_t handle, void > efi_handover_entry(efi_handle_t handle, efi_system_table_t *sys_table_arg, > struct boot_params *boot_params) { > - memset(_bss, 0, _ebss - _bss); Removing this is not safe. free_mem_ptr is not the only global variable in BSS that needs to be zeroed, there are others too. > efi_stub_entry(handle, sys_table_arg, boot_params); } > > -- > 2.34.1 >