On Thu, Aug 04, 2022 at 10:58:36AM +0200, Laszlo Ersek wrote: > On 08/04/22 09:03, Michael S. Tsirkin wrote: > > On Thu, Aug 04, 2022 at 02:44:11AM +0200, Jason A. Donenfeld wrote: > >> The boot parameter header refers to setup_data at an absolute address, > >> and each setup_data refers to the next setup_data at an absolute address > >> too. Currently QEMU simply puts the setup_datas right after the kernel > >> image, and since the kernel_image is loaded at prot_addr -- a fixed > >> address knowable to QEMU apriori -- the setup_data absolute address > >> winds up being just `prot_addr + a_fixed_offset_into_kernel_image`. > >> > >> This mostly works fine, so long as the kernel image really is loaded at > >> prot_addr. However, OVMF doesn't load the kernel at prot_addr, and > >> generally EFI doesn't give a good way of predicting where it's going to > >> load the kernel. So when it loads it at some address != prot_addr, the > >> absolute addresses in setup_data now point somewhere bogus, causing > >> crashes when EFI stub tries to follow the next link. > >> > >> Fix this by placing setup_data at some fixed place in memory, relative > >> to real_addr, not as part of the kernel image, and then pointing the > >> setup_data absolute address to that fixed place in memory. This way, > >> even if OVMF or other chains relocate the kernel image, the boot > >> parameter still points to the correct absolute address. > >> > >> Fixes: 3cbeb52467 ("hw/i386: add device tree support") > >> Reported-by: Xiaoyao Li <xiaoyao.li@xxxxxxxxx> > >> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> > >> Cc: Richard Henderson <richard.henderson@xxxxxxxxxx> > >> Cc: Peter Maydell <peter.maydell@xxxxxxxxxx> > >> Cc: Michael S. Tsirkin <mst@xxxxxxxxxx> > >> Cc: Daniel P. Berrangé <berrange@xxxxxxxxxx> > >> Cc: Gerd Hoffmann <kraxel@xxxxxxxxxx> > >> Cc: Ard Biesheuvel <ardb@xxxxxxxxxx> > >> Cc: linux-efi@xxxxxxxxxxxxxxx > >> Signed-off-by: Jason A. Donenfeld <Jason@xxxxxxxxx> > > > > Didn't read the patch yet. > > Adding Laszlo. > > As I said in > <http://mid.mail-archive.com/8bcc7826-91ab-855e-7151-2e9add88025a@xxxxxxxxxx>, > I don't believe that the setup_data chaining described in > <https://www.kernel.org/doc/Documentation/x86/boot.rst> can be made work > for UEFI guests at all, with QEMU pre-populating the links with GPAs. > > However, rather than introducing a new info channel, or reusing an > existent one (ACPI linker/loader, GUID-ed structure chaining in pflash), > for the sake of this feature, I suggest simply disabling this feature > for UEFI guests. setup_data chaining has not been necessary for UEFI > guests for years (this is the first time I've heard about it in more > than a decade), and the particular use case (provide guests with good > random seed) is solved for UEFI guests via virtio-rng / EFI_RNG_PROTOCOL. > > ... Now, here's my problem: microvm, and Xen. > > As far as I can tell, QEMU can determine (it already does determine) > whether the guest uses UEFI or not, for the "pc" and "q35" machine > types. But not for microvm or Xen! > > Namely, from pc_system_firmware_init() [hw/i386/pc_sysfw.c], we can > derive that > > pflash_cfi01_get_blk(pcms->flash[0]) > > returning NULL vs. non-NULL stands for "BIOS vs. UEFI". Note that this > is only valid after the inital part of pc_system_firmware_init() has run > ("Map legacy -drive if=pflash to machine properties"), but that is not a > problem, given the following call tree: I don't beleve that's a valid check anymore since Gerd introduced the ability to load UEFI via -bios, for UEFI builds without persistent variables. ( a8152c4e4613c70c2f0573a82babbc8acc00cf90 ) > Which is a big problem for my idea, because QEMU has no way of > identifying whether microvm is going to boot a custom SeaBIOS binary > (where the current setup_data chaining is OK) or a custom OVMF binary > (where the current setup_data chaining crashes the guest kernel). > > So I thought that for pc and q35, I should be able to propose a fix, > based on: > > pflash_cfi01_get_blk(pcms->flash[0]) > > but it turns out I don't know what to do about Xen; and worse, for > MicroVM, it's currently *impossible* for QEMU to tell apart UEFI from > other guest firmwares. Yep, and ultimately the inability to distinguish UEFI vs other firmware is arguably correct by design, as the QEMU <-> firmware interface is supposed to be arbitrarily pluggable for any firmware implementation not limited to merely UEFI + seabios. > For now I suggest either reverting the original patch, or at least not > enabling the knob by default for any machine types. In particular, when > using MicroVM, the user must leave the knob disabled when direct booting > a kernel on OVMF, and the user may or may not enable the knob when > direct booting a kernel on SeaBIOS. Having it opt-in via a knob would defeat Jason's goal of having the seed available automatically. With regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|