On March 6, 2025 6:36:04 AM PST, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: >(cc Peter) > >On Tue, 4 Mar 2025 at 15:49, Ulrich Gemkow ><ulrich.gemkow@xxxxxxxxxxxxxxxxxxxx> wrote: >> >> Hello, >> >> starting with stable kernel 6.6.18 we have problems with PXE booting. >> A bisect shows that the following patch is guilty: >> >> From 768171d7ebbce005210e1cf8456f043304805c15 Mon Sep 17 00:00:00 2001 >> From: Ard Biesheuvel <ardb@xxxxxxxxxx> >> Date: Tue, 12 Sep 2023 09:00:55 +0000 >> Subject: x86/boot: Remove the 'bugger off' message >> >> Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx> >> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx> >> Acked-by: H. Peter Anvin (Intel) <hpa@xxxxxxxxx> >> Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@xxxxxxxxxx >> >> With this patch applied PXE starts, requests the kernel and the initrd. >> Without showing anything on the console, the boot process stops. >> It seems, that the kernel crashes very early. >> >> With stable kernel 6.6.17 PXE boot works without problems. >> >> Reverting this single patch (which is part of a larger set of >> patches) solved the problem for us, PXE boot is working again. >> >> We use the packages syslinux-efi and syslinux-common from Debian 12. >> The used boot files are /efi64/syslinux.efi and /ldlinux.e64. >> > >I managed to track this down to a bug in syslinux, fixed by the hunk >below. The problem is that syslinux violates the x86 boot protocol, >which stipulates that the setup header (starting at 0x1f1 bytes into >the bzImage) must be copied into a zeroed boot_params structure, but >it also copies the preceding bytes, which could be any value, as they >overlap with the PE/COFF header or other header data. This produces a >command line pointer with garbage in the top 32 bits, resulting in an >early crash. > >In your case, you might be able to work around this by removing the >padding value (=0xffffffff) from arch/x86/boot/setup.ld, given that >you are building with CONFIG_EFI_STUB disabled. However, this still >requires fixing on the syslinux side. > > > >[syslinux base commit 05ac953c23f90b2328d393f7eecde96e41aed067] > >--- a/efi/main.c >+++ b/efi/main.c >@@ -1139,10 +1139,14 @@ > bp = (struct boot_params *)(UINTN)addr; > > memset((void *)bp, 0x0, BOOT_PARAM_BLKSIZE); >- /* Copy the first two sectors to boot_params */ >- memcpy((char *)bp, kernel_buf, 2 * 512); > hdr = (struct linux_header *)bp; > >+ /* Copy the setup header to boot_params */ >+ memcpy(&hdr->setup_sects, >+ &((struct linux_header *)kernel_buf)->setup_sects, >+ sizeof(struct linux_header) - >+ offsetof(struct linux_header, setup_sects)); >+ > setup_sz = (hdr->setup_sects + 1) * 512; > if (hdr->version >= 0x20a) { > pref_address = hdr->pref_address; >--- a/com32/include/syslinux/linux.h >+++ b/com32/include/syslinux/linux.h >@@ -116,6 +116,7 @@ struct linux_header { > uint64_t pref_address; > uint32_t init_size; > uint32_t handover_offset; >+ uint32_t kernel_info_offset; > } __packed; > > struct screen_info { Interesting. Embarrassing, first of all :) but also interesting, because this is exactly why we have the "sentinel" field at 0x1f0 to catch *this specific error* and work around it.