On Mon, 19 Aug 2024 at 16:55, Pingfan Liu <piliu@xxxxxxxxxx> wrote: > > *** Background *** > > As more PE format kernel images are introduced, it post challenge to kexec to > cope with the new format. > > In my attempt to add support for arm64 zboot image in the kernel [1], > Ard suggested using an emulator to tackle this issue. Last year, when > Jan tried to introduce UKI support in the kernel [2], Ard mentioned the > emulator approach again [3] > > After discussion, Ard's approach seems to be a more promising solution > to handle PE format kernels once and for all. This series follows that > approach and implements an emulator to emulate EFI boot time services, > allowing the efistub kernel to self-extract and boot. > > Another year has passed, and UKI kernel is more and more frequently used > in product. I think it is time to pay effort to resolve this issue. > > > *** Overview of implement *** > The whole model consits of three parts: > > -1. The emulator > It is a self-relocatable PIC code, which is finally linked into kernel, but not > export any internal symbol to kernel. It mainly contains: a PE file parser, > which loads PE format kernel, a group of functions to emulate efi boot service. > > -2. inside kernel, PE-format loader > Its main task is to set up two extra kexec_segment, one for emulator, the other > for passing information from the first kernel to emulator. > > -3. set up identity mapping only for the memory used by the emulator. > Here it relies on kimage_alloc_control_pages() to get pages, which will not > stamped during the process of kexec relocate (cp from src to dst). And since the > mapping only covers a small range of memory, it cost small amount memory. > > > *** To do *** > > Currently, it only works on arm64 virt machine. For x86, it needs some slightly > changes. (I plan to do it in the next version) > > Also, this series does not implement a memory allocator, which I plan to > implement with the help of bitmap. > > About console, currently it hard code for arm64 virt machine, later it should > extract the information through ACPI table. > > For kdump code, it is not implmented yet. But it should share the majority of > this series. > > > *** Test of this series *** > I have tested this series on arm64 virt machine. There I booted the vmlinuz.efi > and kexec_file_load a UKI image, then switch to the second kernel. > > I used a modified kexec-tools [4], which just skips the check of the file format and passes the file directly to kernel. > > [1]: https://lore.kernel.org/linux-arm-kernel/ZBvKSis+dfnqa+Vz@xxxxxxxxxxxxxxxxxxxxxxxxxx/T/#m42abb0ad3c10126b8b3bfae8a596deb707d6f76e > [2]: https://lore.kernel.org/lkml/20230918173607.421d2616@rotkaeppchen/T/ > [3]: https://lore.kernel.org/lkml/20230918173607.421d2616@rotkaeppchen/T/#mc60aa591cb7616ceb39e1c98f352383f9ba6e985 > [4]: https://github.com/pfliu/kexec-tools.git branch: kexec_uefi_emulator > > > RFCv1 -> RFCv2: > -1.Support to run UKI kernel by: add LoadImage() and StartImage(), add > PE file relocation support, add InstallMultiProtocol() > -2.Also set up idmap for EFI runtime memory descriptor since UKI's > systemd-stub calls runtime service > -3.Move kexec_pe_image.c from arch/arm64/kernel to kernel/, since it > aims to provide a more general architecture support. > > RFCv1: https://lore.kernel.org/linux-efi/20240718085759.13247-1-piliu@xxxxxxxxxx/ > RFCv2: https://github.com/pfliu/linux.git branch kexec_uefi_emulator_RFCv2 > > Cc: Ard Biesheuvel <ardb@xxxxxxxxxx> > Cc: Jan Hendrik Farr <kernel@xxxxxxxx> > Cc: Philipp Rudo <prudo@xxxxxxxxxx> > Cc: Lennart Poettering <mzxreary@xxxxxxxxxxx> > Cc: Jarkko Sakkinen <jarkko@xxxxxxxxxx> > Cc: Eric Biederman <ebiederm@xxxxxxxxxxxx> > Cc: Baoquan He <bhe@xxxxxxxxxx> > Cc: Dave Young <dyoung@xxxxxxxxxx> > Cc: Mark Rutland <mark.rutland@xxxxxxx> > Cc: Will Deacon <will@xxxxxxxxxx> > Cc: Catalin Marinas <catalin.marinas@xxxxxxx> > Cc: kexec@xxxxxxxxxxxxxxxxxxx > Cc: linux-efi@xxxxxxxxxxxxxxx > Cc: linux-kernel@xxxxxxxxxxxxxxx > > > > Pingfan Liu (9): > efi/libstub: Ask efi_random_alloc() to skip unusable memory > efi/libstub: Complete efi_simple_text_output_protocol > efi/emulator: Initial rountines to emulate EFI boot time service > efi/emulator: Turn on mmu for arm64 > kexec: Introduce kexec_pe_image to parse and load PE file > arm64: kexec: Introduce a new member param_mem to kimage_arch > arm64: mm: Change to prototype of > arm64: kexec: Prepare page table for emulator > arm64: kexec: Enable kexec_pe_image > Thanks for putting this RFC together. This is useful work, and gives us food for thought and discussion. There are a few problems that become apparent when going through these changes. 1. Implementing UEFI entirely is intractable, and unnecessary. Implementing the subset of UEFI that is actually needed to boot Linux *is* tractable, though, but we need to work together to write this down somewhere. - the EFI stub needs the boot services for the EFI memory map and the allocation routines - GRUB needs block I/O - systemd-stub/UKI needs file I/O to look for sidecars - etc etc I implemented a Rust 'efiloader' crate a while ago that encapsulates most of this (it can boot Linux/arm64 on QEMU and boot x86 via GRUB in user space **). Adding file I/O to this should be straight-forward - as Lennart points out, we only need the protocol, it doesn't need to be backed by an actual file system, it just needs to be able to expose other files in the right way. 2. Running the UEFI emulator on bare metal is not going to scale. Cloning UART driver code and MMU code etc is a can of worms that you want to leave closed. And as Lennart points out, there is other hardware (TPM) that needs to be accessible as well. Providing a separate set of drivers for all hardware that the EFI emulator may need to access is not a tractable problem either. The fix for this, as I see it, is to run the EFI emulator in user space, to the point where the payload calls ExitBootServices(). This will allow all I/O and memory protocol to be implemented trivially, using C library routines. I have a crude prototype** of this running to the point where ExitBootServices() is called (and then it crashes). The tricky yet interesting bit here is how we migrate a chunk of user space memory to the bare metal context that will be created by the kexec syscall later (in which the call to ExitBootServices() would return and proceed with the boot). But the principle is rather straight-forward, and would permit us, e.g., to kexec an OS installer too. 3. We need to figure out how to support TPM and PCRs in the context of kexec. This is a fundamental issue with verified boot, given that the kexec PCR state is necessarily different from the boot state, and so we cannot reuse the TPM directly if we want to pretend that we are doing an ordinary boot in kexec. The alternative is to leave the TPM in a state where the kexec kernel can access its sealed secrets, and mock up the TCG2 EFI protocols using a shim that sits between the TPM hardware (as the real TCG2 protocols will be long gone) and the EFI payload. But as I said, this is a fundamental issue, as the ability to pretend that a kexec boot is a pristine boot would mean that verified boot is broken. As future work, I'd like to propose to collaborate on some alignment regarding a UEFI baseline for Linux, i.e., the parts that we actually need to boot Linux. For this series in particular, I don't see a way forward where we adopt this approach, and carry all this code inside the kernel. Thanks. Ard.