On Thu, 6 Mar 2025 at 15:39, H. Peter Anvin <hpa@xxxxxxxxx> wrote: > > On March 6, 2025 6:36:04 AM PST, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote: > >(cc Peter) > > > >On Tue, 4 Mar 2025 at 15:49, Ulrich Gemkow > ><ulrich.gemkow@xxxxxxxxxxxxxxxxxxxx> wrote: > >> > >> Hello, > >> > >> starting with stable kernel 6.6.18 we have problems with PXE booting. > >> A bisect shows that the following patch is guilty: > >> > >> From 768171d7ebbce005210e1cf8456f043304805c15 Mon Sep 17 00:00:00 2001 > >> From: Ard Biesheuvel <ardb@xxxxxxxxxx> > >> Date: Tue, 12 Sep 2023 09:00:55 +0000 > >> Subject: x86/boot: Remove the 'bugger off' message > >> > >> Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx> > >> Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx> > >> Acked-by: H. Peter Anvin (Intel) <hpa@xxxxxxxxx> > >> Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@xxxxxxxxxx > >> > >> With this patch applied PXE starts, requests the kernel and the initrd. > >> Without showing anything on the console, the boot process stops. > >> It seems, that the kernel crashes very early. > >> > >> With stable kernel 6.6.17 PXE boot works without problems. > >> > >> Reverting this single patch (which is part of a larger set of > >> patches) solved the problem for us, PXE boot is working again. > >> > >> We use the packages syslinux-efi and syslinux-common from Debian 12. > >> The used boot files are /efi64/syslinux.efi and /ldlinux.e64. > >> > > > >I managed to track this down to a bug in syslinux, fixed by the hunk > >below. The problem is that syslinux violates the x86 boot protocol, > >which stipulates that the setup header (starting at 0x1f1 bytes into > >the bzImage) must be copied into a zeroed boot_params structure, but > >it also copies the preceding bytes, which could be any value, as they > >overlap with the PE/COFF header or other header data. This produces a > >command line pointer with garbage in the top 32 bits, resulting in an > >early crash. > > > >In your case, you might be able to work around this by removing the > >padding value (=0xffffffff) from arch/x86/boot/setup.ld, given that > >you are building with CONFIG_EFI_STUB disabled. However, this still > >requires fixing on the syslinux side. > > > > > > > >[syslinux base commit 05ac953c23f90b2328d393f7eecde96e41aed067] > > > >--- a/efi/main.c > >+++ b/efi/main.c > >@@ -1139,10 +1139,14 @@ > > bp = (struct boot_params *)(UINTN)addr; > > > > memset((void *)bp, 0x0, BOOT_PARAM_BLKSIZE); > >- /* Copy the first two sectors to boot_params */ > >- memcpy((char *)bp, kernel_buf, 2 * 512); > > hdr = (struct linux_header *)bp; > > > >+ /* Copy the setup header to boot_params */ > >+ memcpy(&hdr->setup_sects, > >+ &((struct linux_header *)kernel_buf)->setup_sects, > >+ sizeof(struct linux_header) - > >+ offsetof(struct linux_header, setup_sects)); > >+ > > setup_sz = (hdr->setup_sects + 1) * 512; > > if (hdr->version >= 0x20a) { > > pref_address = hdr->pref_address; > >--- a/com32/include/syslinux/linux.h > >+++ b/com32/include/syslinux/linux.h > >@@ -116,6 +116,7 @@ struct linux_header { > > uint64_t pref_address; > > uint32_t init_size; > > uint32_t handover_offset; > >+ uint32_t kernel_info_offset; > > } __packed; > > > > struct screen_info { > > Interesting. Embarrassing, first of all :) but also interesting, because this is exactly why we have the "sentinel" field at 0x1f0 to catch *this specific error* and work around it. We're crashing way earlier than the sentinel check - the bogus command line pointer is dereferenced via startup_64() configure_5level_paging() cmdline_find_option_bool() whereas sanitize_bootparams() is only called much later, from extract_kernel().