On Sun, Nov 11, 2018 at 10:49:39AM -0800, H. Peter Anvin wrote: > On 11/10/18 1:03 AM, Juergen Gross wrote: > > > > How would that help? The garabge data written could have the correct > > terminal sentinel value by chance. > > > > That's why I re-used an existing field in setup_header (the version) to > > let grub tell the kernel which part of setup_header was written by grub. > > > > That's the only way I could find to let the kernel distinguish between > > garbage and actual data. > > There is plenty of space *before* the setup_header part of struct boot_params > too -- look a the various __pad fields, especially (in your case), __pad3[16] > and __pad4[116] would suit the bill just fine. > > >> It would be enormously helpful if you could find out any more details about > >> exactly what they are doing to break things. > > > > That's easy: > > > > The memory layout is: > > > > 0x1f1 bytes of data, including the sentinel, the setup_header, and then > > more data. > > > > grub did read the kernel's setup_header in the correct size into its > > buffer (which contains random garbage before that), intializes the first > > 0x1f1 including the sentinel byte, and then writes back the buffer, but > > using a too large length for that. > > Are you kidding me... it really overwrites it with completely random data, and > not simply overspilling contents of the file? > > In that case it might not be possible (or desirable) to use those N bytes > following the setup_heaader, or we need to a bigger sentinel than one byte > (probability being what it is, 256^n gets to be a pretty big number for any n, > very quickly drowning in the noise compared to other potential sources of boot > failures, and most likely less fatal than most.) > > How big is this garbage dump? I'm going to brave a guess it is 512 bytes. In > that case this is hardly a big deal: the E820 map begins at 0x290, and the > setup_header maximum goes to 0x280, so it is only 15 bytes lost. If it is > worse than that, we would risk losing __pad8[48] and __pad9[276], and > especially the latter would be painful. In those case perhaps we should use > 0x281..0x290 as a 15-byte sentinel; that is going to be virtually foolproof. > > I'm also thinking that it might be desirable to add a flags field (__pad2 > would be ideal) to struct boot_params; it would let us recycle some of the > obsolete fields (hd0_info, hd1_info, sys_desc_table, olpc_ofw_header, ...) and > perhaps be able to add some more robustness against these sort of things. This > would be the right way to do what your version feedback mechanism would do. > > The reason why the feedback mechanism is fundamentally broken is that it only > gives the boot loader a way to assert that it supports a certain version of > the protocol, but it doesn't say *which* bootloader does such an assert -- and > therefore it is still wide open to implementation error. > > We do, in fact, already have a feedback mechanism: the bootloader ID and > bootloader version. One way we could deal with this problem is to bump the > bootloader version reported by Grub, and add a quirk to the kernel that if the > bootloader ID is Grub (7) and the version is less than a certain number, zero > those fields. In fact, the more I think about it, this is what we should do. > > That being said, Grub really needs to be kept honest. They keep making both > severe design (like refusing to use the BIOS and UEFI entry points provided by > the kernel by default) and implementation errors, apparently without > meaningful oversight. I really ask that the people more closely involved with > Grub try to keep a closer eye on their code as it applies to Linux. Cc-ing GRUB and Daniel Kiper (maintainer of GRUB). Could folks please please CC Daniel Kiper on any of these patches in the future? Thanks. > > -hpa