Re: Regression for PXE boot from patch "Remove the 'bugger off' message" in stable 6.6.18

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On March 6, 2025 6:36:04 AM PST, Ard Biesheuvel <ardb@xxxxxxxxxx> wrote:
>(cc Peter)
>
>On Tue, 4 Mar 2025 at 15:49, Ulrich Gemkow
><ulrich.gemkow@xxxxxxxxxxxxxxxxxxxx> wrote:
>>
>> Hello,
>>
>> starting with stable kernel 6.6.18 we have problems with PXE booting.
>> A bisect shows that the following patch is guilty:
>>
>>   From 768171d7ebbce005210e1cf8456f043304805c15 Mon Sep 17 00:00:00 2001
>>   From: Ard Biesheuvel <ardb@xxxxxxxxxx>
>>   Date: Tue, 12 Sep 2023 09:00:55 +0000
>>   Subject: x86/boot: Remove the 'bugger off' message
>>
>>   Signed-off-by: Ard Biesheuvel <ardb@xxxxxxxxxx>
>>   Signed-off-by: Ingo Molnar <mingo@xxxxxxxxxx>
>>   Acked-by: H. Peter Anvin (Intel) <hpa@xxxxxxxxx>
>>   Link: https://lore.kernel.org/r/20230912090051.4014114-21-ardb@xxxxxxxxxx
>>
>> With this patch applied PXE starts, requests the kernel and the initrd.
>> Without showing anything on the console, the boot process stops.
>> It seems, that the kernel crashes very early.
>>
>> With stable kernel 6.6.17 PXE boot works without problems.
>>
>> Reverting this single patch (which is part of a larger set of
>> patches) solved the problem for us, PXE boot is working again.
>>
>> We use the packages syslinux-efi and syslinux-common from Debian 12.
>> The used boot files are /efi64/syslinux.efi and /ldlinux.e64.
>>
>
>I managed to track this down to a bug in syslinux, fixed by the hunk
>below. The problem is that syslinux violates the x86 boot protocol,
>which stipulates that the setup header (starting at 0x1f1 bytes into
>the bzImage) must be copied into a zeroed boot_params structure, but
>it also copies the preceding bytes, which could be any value, as they
>overlap with the PE/COFF header or other header data. This produces a
>command line pointer with garbage in the top 32 bits, resulting in an
>early crash.
>
>In your case, you might be able to work around this by removing the
>padding value (=0xffffffff) from arch/x86/boot/setup.ld, given that
>you are building with CONFIG_EFI_STUB disabled. However, this still
>requires fixing on the syslinux side.
>
>
>
>[syslinux base commit 05ac953c23f90b2328d393f7eecde96e41aed067]
>
>--- a/efi/main.c
>+++ b/efi/main.c
>@@ -1139,10 +1139,14 @@
>        bp = (struct boot_params *)(UINTN)addr;
>
>        memset((void *)bp, 0x0, BOOT_PARAM_BLKSIZE);
>-       /* Copy the first two sectors to boot_params */
>-       memcpy((char *)bp, kernel_buf, 2 * 512);
>        hdr = (struct linux_header *)bp;
>
>+        /* Copy the setup header to boot_params */
>+        memcpy(&hdr->setup_sects,
>+              &((struct linux_header *)kernel_buf)->setup_sects,
>+              sizeof(struct linux_header) -
>+              offsetof(struct linux_header, setup_sects));
>+
>        setup_sz = (hdr->setup_sects + 1) * 512;
>        if (hdr->version >= 0x20a) {
>                pref_address = hdr->pref_address;
>--- a/com32/include/syslinux/linux.h
>+++ b/com32/include/syslinux/linux.h
>@@ -116,6 +116,7 @@ struct linux_header {
>     uint64_t pref_address;
>     uint32_t init_size;
>     uint32_t handover_offset;
>+    uint32_t kernel_info_offset;
> } __packed;
>
> struct screen_info {

Interesting. Embarrassing, first of all :) but also interesting, because this is exactly why we have the "sentinel" field at 0x1f0 to catch *this specific error* and work around it.





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux