From: Ard Biesheuvel <ardb@xxxxxxxxxx> One of the tedious bits of booting a virtual machine under KVM on ARM is dealing with guest memory coherency. This is due the fact that running with the MMU off is problematic, as manipulations of memory by the guest are incoherent with the host's cached view of memory. For this reason, KVM needs to keep track of the MMU state of the guest, and perform cache maintenance to the point of coherency (PoC) on all memory that is exposed to the guest (and populated at stage 2) at that point. Existing VM firmware is often based on bare metal firmware, which sets up page tables with the MMU and caches off, and does the necessary (as well as unnecessary *) cache maintenance to ensure that all manipulations of memory performed with the MMU off are coherent, and not covered by stale cachelines (either clean or dirty) that either obstruct the view of the real memory contents, or are at risk of corrupting them if such dirty cachelines are evicted and written back inadvertently. As firmware is usually intimately tied to the memory topology of the platform, we can do much better than this. Instead of setting up the initial page tables at runtime, we can bake the into the boot image, provided that it runs at an a priori known address. This means we can enable MMU and caches straight out of reset, and defer all memory accesses that go via the D side until after. This is the approach taken by this series: it implements a minimal firmware/bootloader for booting a Linux arm64 kernel on QEMU's mach-virt, which does minimal code execution and no memory access (other than instruction fetching) with the MMU disabled. Combined with the series that I sent out recently [0] for Linux, which implements something similar for the kernel itself, virtually all cache maintenance to the PoC can be dropped from the boot flow (with the exception of the .idmap page in the kernel itself). Given that no stores to memory occur at all with the MMU off, KVM should be able to detect that the PoC maintenance is no longer necessary when the MMU is turned on. This is not only a simplification in itself, it also means that minimal code execution occurs while restricted memory permissions are being honoured: the firmware boots with WXN protections enabled, and the Rust code itself as well as the text section of the loaded kernel Image need to be mapped with read-only permissions in order to execute them. This prototype is presented as v0, as it cuts some corners, while the intent is to make this an implementation of EFI that provides all that Linux needs to boot. Most notably, - only ~900 MiB of DRAM is supported, due to the fact that the page table code I nicked greedily maps down to pages, and the heap is only around 2 MiB, so we run out of memory if we try to map more. - it boots via the kernel's 'bare metal' entrypoint as EFI features are entirely missing for the moment. - only uncompressed kernels are supported How to build and run: (first, build a kernel with [0] applied, so the image tolerates being booted with MMU and caches enabled) $ cargo build # using a nightly Rust compiler $ objcopy -O binary target/aarch64-unknown-linux-gnu/debug/efilite efilite.bin $ qemu-system-aarch64 \ -M virt,gic-version=host -cpu host -enable-kvm -smp 4 \ -net none -nographic -m 900m -bios efilite.bin -kernel path/to/Image \ -drive if=virtio,file=path/to/hda.xxx,format=xxx -append root=/dev/vda2 * U-Boot in particular carries a lot of set/way cache maintenance that was cargo culted from the v7 days, and should never be needed in VM [0] https://lore.kernel.org/all/20220304175657.2744400-1-ardb@xxxxxxxxxx/ Cc: Marc Zyngier <maz@xxxxxxxxxx> Cc: Will Deacon <will@xxxxxxxxxx> Cc: Quentin Perret <qperret@xxxxxxxxxx> Cc: David Brazdil <dbrazdil@xxxxxxxxxx> Cc: Fuad Tabba <tabba@xxxxxxxxxx> Cc: Kees Cook <keescook@xxxxxxxxxxxx> Ard Biesheuvel (6): Implement a bare metal Rust runtime on top of QEMU's mach-virt Add DTB processing Add paging code to manage the full ID map Discover QEMU fwcfg device and use it to load the kernel Remap code section of loaded kernel and boot it Temporarily pass the kaslr seed via register X1 .cargo/config | 5 + .gitignore | 2 + Cargo.lock | 87 ++++ Cargo.toml | 12 + efilite.lds | 62 +++ src/cmo.rs | 37 ++ src/console.rs | 57 +++ src/cstring.rs | 9 + src/fwcfg.rs | 85 ++++ src/head.S | 121 +++++ src/main.rs | 155 +++++- src/pagealloc.rs | 44 ++ src/paging.rs | 499 ++++++++++++++++++++ src/pecoff.rs | 23 + src/ttable.S | 37 ++ 15 files changed, 1233 insertions(+), 2 deletions(-) create mode 100644 .cargo/config create mode 100644 Cargo.lock create mode 100644 efilite.lds create mode 100644 src/cmo.rs create mode 100644 src/console.rs create mode 100644 src/cstring.rs create mode 100644 src/fwcfg.rs create mode 100644 src/head.S create mode 100644 src/pagealloc.rs create mode 100644 src/paging.rs create mode 100644 src/pecoff.rs create mode 100644 src/ttable.S -- 2.30.2