Hi all, This rather large series (based on -rc2) builds on top of the limited pKVM support available upstream and gets us to a point where the hypervisor code at EL2 is capable of running guests in both non-protected and protected mode on the same system. For more background information about pKVM, the following (slightly dated) LWN article may be informative: https://lwn.net/Articles/836693/ The structure of this series is roughly as follows: * Patches 01-06 : - Some small cleanups and minor fixes. * Patches 07-12 : - Memory management changes at EL2 to allow the donation of memory from the host to the hypervisor and the "pinning" of shared memory at EL2. * Patches 13-16 : - Introduction of shadow VM and vCPU state at EL2 so that the hypervisor can manage guest state using its own private data structures, initially populated from the host structures. * Patches 17-33 : - Further memory management changes at EL2 to allow the allocation and reclaim of guest memory by the host. This then allows us to manage guest stage-2 page-tables entirely at EL2, with the host issuing hypercalls to map guest pages in response to faults. * Patches 34-78 : - Gradual reduction of EL2 trust in host data; rather than copy blindly between the host and shadow structures, we instead selectively sync/flush between them and reduce the amount of host data that is accessed directly by EL2. * Patches 79-81 : - Inject an abort into the host if it tries to access a guest page for which it does not have permission. This will then deliver a SEGV if the access originated from userspace. * Patches 82-87 : - Expose hypercalls to protected guests for sharing memory back with the host * Patches 88-89 : - Introduce the new machine type and add some documentation. We considered splitting this into multiple series, but decided to keep everything together initially so that reviewers can more easily get an idea of what we're trying to do and also take it for a spin. The patches are also available in our git tree here: https://android-kvm.googlesource.com/linux/+/refs/heads/for-upstream/pkvm-base-v1 It's worth pointing out that, although we've been tracking the fd-based proposal around KVM private memory [1], for now the approach taken here interacts directly with anonymous pages using a longterm GUP pin. We're expecting to prototype an fd-based implementation once the discussion at [2] has converged. In the meantime, we hope to progress the non-protected VM support. Finally, there are still some features that we have not included in this posting and will come later on: - Support for read-only memslots and dirty logging for non-protected VMs. We currently document that this doesn't work (setting the memslot flags will fail), but we're working to enable this. - Support for IOMMU configuration to protect guest memory from DMA attacks by the host. - Support for optional loading of the guest's initial firmware by the hypervisor. - Proxying of host interactions with Trustzone, intercepting and validating FF-A [3] calls at EL2. - Support for restricted MMIO exits to only regions designated as MMIO by the guest. An earlier version of this work was previously posted at [4]. - Hardware debug and PMU support for non-protected guests -- this builds on the separate series posted at [5] and which is now queued for 5.19. - Guest-side changes to issue the new pKVM hypercalls, for example sharing back the SWIOTLB buffer with the host for virtio traffic. Please enjoy, Will, Quentin, Fuad and Marc [1] https://lore.kernel.org/all/20220310140911.50924-1-chao.p.peng@xxxxxxxxxxxxxxx/ [2] https://lore.kernel.org/r/20220422105612.GB61987@xxxxxxxxxxxxxxxxxx [3] https://developer.arm.com/documentation/den0077/latest [4] https://lore.kernel.org/all/20211004174849.2831548-1-maz@xxxxxxxxxx/ [5] https://lore.kernel.org/all/20220510095710.148178-1-tabba@xxxxxxxxxx/ Cc: Ard Biesheuvel <ardb@xxxxxxxxxx> Cc: Sean Christopherson <seanjc@xxxxxxxxxx> Cc: Will Deacon <will@xxxxxxxxxx> Cc: Alexandru Elisei <alexandru.elisei@xxxxxxx> Cc: Andy Lutomirski <luto@xxxxxxxxxxxxxx> Cc: Catalin Marinas <catalin.marinas@xxxxxxx> Cc: James Morse <james.morse@xxxxxxx> Cc: Chao Peng <chao.p.peng@xxxxxxxxxxxxxxx> Cc: Quentin Perret <qperret@xxxxxxxxxx> Cc: Suzuki K Poulose <suzuki.poulose@xxxxxxx> Cc: Michael Roth <michael.roth@xxxxxxx> Cc: Mark Rutland <mark.rutland@xxxxxxx> Cc: Fuad Tabba <tabba@xxxxxxxxxx> Cc: Oliver Upton <oupton@xxxxxxxxxx> Cc: Marc Zyngier <maz@xxxxxxxxxx> Cc: kernel-team@xxxxxxxxxxx Cc: kvm@xxxxxxxxxxxxxxx Cc: kvmarm@xxxxxxxxxxxxxxxxxxxxx Cc: linux-arm-kernel@xxxxxxxxxxxxxxxxxxx --->8 Fuad Tabba (23): KVM: arm64: Add hyp_spinlock_t static initializer KVM: arm64: Introduce shadow VM state at EL2 KVM: arm64: Instantiate VM shadow data from EL1 KVM: arm64: Do not allow memslot changes after first VM run under pKVM KVM: arm64: Add hyp per_cpu variable to track current physical cpu number KVM: arm64: Ensure that TLBs and I-cache are private to each vcpu KVM: arm64: Do not pass the vcpu to __pkvm_host_map_guest() KVM: arm64: Check directly whether the vcpu is protected KVM: arm64: Trap debug break and watch from guest KVM: arm64: Restrict protected VM capabilities KVM: arm64: Do not support MTE for protected VMs KVM: arm64: Refactor reset_mpidr to extract its computation KVM: arm64: Reset sysregs for protected VMs KVM: arm64: Move pkvm_vcpu_init_traps to shadow vcpu init KVM: arm64: Fix initializing traps in protected mode KVM: arm64: Add EL2 entry/exit handlers for pKVM guests KVM: arm64: Refactor kvm_vcpu_enable_ptrauth() for hyp use KVM: arm64: Initialize shadow vm state at hyp KVM: arm64: Add HVC handling for protected guests at EL2 KVM: arm64: Move pstate reset values to kvm_arm.h KVM: arm64: Move some kvm_psci functions to a shared header KVM: arm64: Factor out vcpu_reset code for core registers and PSCI KVM: arm64: Handle PSCI for protected VMs in EL2 Marc Zyngier (20): KVM: arm64: Handle all ID registers trapped for a protected VM KVM: arm64: Drop stale comment KVM: arm64: Check for PTE validity when checking for executable/cacheable KVM: arm64: Make vcpu_{read,write}_sys_reg available to HYP code KVM: arm64: Simplify vgic-v3 hypercalls KVM: arm64: Add the {flush,sync}_vgic_state() primitives KVM: arm64: Introduce predicates to check for protected state KVM: arm64: Add the {flush,sync}_timer_state() primitives KVM: arm64: Introduce the pkvm_vcpu_{load,put} hypercalls KVM: arm64: Add current vcpu and shadow_state lookup primitive KVM: arm64: Skip __kvm_adjust_pc() for protected vcpus KVM: arm64: Introduce per-EC entry/exit handlers KVM: arm64: Introduce lazy-ish state sync for non-protected VMs KVM: arm64: Lazy host FP save/restore KVM: arm64: Reduce host/shadow vcpu state copying KVM: arm64: Force injection of a data abort on NISV MMIO exit KVM: arm64: Donate memory to protected guests KVM: arm64: Move vgic state between host and shadow vcpu structures KVM: arm64: Do not update virtual timer state for protected VMs KVM: arm64: Track the SVE state in the shadow vcpu Quentin Perret (22): KVM: arm64: Move hyp refcount manipulation helpers KVM: arm64: Back hyp_vmemmap for all of memory KVM: arm64: Implement do_donate() helper for donating memory KVM: arm64: Prevent the donation of no-map pages KVM: arm64: Add helpers to pin memory shared with hyp KVM: arm64: Make hyp stage-1 refcnt correct on the whole range KVM: arm64: Factor out private range VA allocation KVM: arm64: Add pcpu fixmap infrastructure at EL2 KVM: arm64: Allow non-coallescable pages in a hyp_pool KVM: arm64: Add generic hyp_memcache helpers KVM: arm64: Instantiate guest stage-2 page-tables at EL2 KVM: arm64: Return guest memory from EL2 via dedicated teardown memcache KVM: arm64: Add flags to struct hyp_page KVM: arm64: Consolidate stage-2 init in one function KVM: arm64: Disallow dirty logging and RO memslots with pKVM KVM: arm64: Don't access kvm_arm_hyp_percpu_base at EL1 KVM: arm64: Unmap kvm_arm_hyp_percpu_base from the host KVM: arm64: Explicitly map kvm_vgic_global_state at EL2 KVM: arm64: Don't map host sections in pkvm KVM: arm64: Add is_pkvm_initialized() helper KVM: arm64: Refactor enter_exception64() KVM: arm64: Inject SIGSEGV on illegal accesses Will Deacon (24): KVM: arm64: Remove redundant hyp_assert_lock_held() assertions KVM: arm64: Return error from kvm_arch_init_vm() on allocation failure KVM: arm64: Ignore 'kvm-arm.mode=protected' when using VHE KVM: arm64: Extend comment in has_vhe() KVM: arm64: Unify identifiers used to distinguish host and hypervisor KVM: arm64: Include asm/kvm_mmu.h in nvhe/mem_protect.h KVM: arm64: Provide I-cache invalidation by VA at EL2 KVM: arm64: Provide a hypercall for the host to reclaim guest memory KVM: arm64: Extend memory sharing to allow host-to-guest transitions KVM: arm64: Use the shadow vCPU structure in handle___kvm_vcpu_run() KVM: arm64: Handle guest stage-2 page-tables entirely at EL2 KVM: arm64: Maintain a copy of 'kvm_arm_vmid_bits' at EL2 KVM: arm64: Extend memory donation to allow host-to-guest transitions KVM: arm64: Split up nvhe/fixed_config.h KVM: arm64: Advertise GICv3 sysreg interface to protected guests KVM: arm64: Don't expose TLBI hypercalls after de-privilege KVM: arm64: Support TLB invalidation in guest context KVM: arm64: Avoid BBM when changing only s/w bits in Stage-2 PTE KVM: arm64: Extend memory sharing to allow guest-to-host transitions KVM: arm64: Document the KVM/arm64-specific calls in hypercalls.rst KVM: arm64: Reformat/beautify PTP hypercall documentation KVM: arm64: Expose memory sharing hypercalls to protected guests KVM: arm64: Introduce KVM_VM_TYPE_ARM_PROTECTED machine type for PVMs Documentation: KVM: Add some documentation for Protected KVM on arm64 .../admin-guide/kernel-parameters.txt | 5 +- Documentation/virt/kvm/api.rst | 7 + Documentation/virt/kvm/arm/hypercalls.rst | 118 ++ Documentation/virt/kvm/arm/index.rst | 2 + Documentation/virt/kvm/arm/pkvm.rst | 96 ++ Documentation/virt/kvm/arm/ptp_kvm.rst | 38 +- arch/arm64/include/asm/kvm_arm.h | 11 +- arch/arm64/include/asm/kvm_asm.h | 28 +- arch/arm64/include/asm/kvm_emulate.h | 92 ++ arch/arm64/include/asm/kvm_host.h | 123 +- arch/arm64/include/asm/kvm_hyp.h | 10 +- arch/arm64/include/asm/kvm_mmu.h | 2 +- arch/arm64/include/asm/kvm_pgtable.h | 8 + arch/arm64/include/asm/kvm_pkvm.h | 257 ++++ arch/arm64/include/asm/virt.h | 15 +- arch/arm64/kernel/cpufeature.c | 10 +- arch/arm64/kernel/image-vars.h | 15 - arch/arm64/kvm/arch_timer.c | 7 +- arch/arm64/kvm/arm.c | 194 ++- arch/arm64/kvm/handle_exit.c | 22 + arch/arm64/kvm/hyp/exception.c | 89 +- arch/arm64/kvm/hyp/hyp-constants.c | 3 + .../arm64/kvm/hyp/include/nvhe/fixed_config.h | 205 --- arch/arm64/kvm/hyp/include/nvhe/gfp.h | 6 +- arch/arm64/kvm/hyp/include/nvhe/mem_protect.h | 25 +- arch/arm64/kvm/hyp/include/nvhe/memory.h | 33 +- arch/arm64/kvm/hyp/include/nvhe/mm.h | 18 +- arch/arm64/kvm/hyp/include/nvhe/pkvm.h | 119 ++ arch/arm64/kvm/hyp/include/nvhe/spinlock.h | 10 +- .../arm64/kvm/hyp/include/nvhe/trap_handler.h | 2 - arch/arm64/kvm/hyp/nvhe/cache.S | 11 + arch/arm64/kvm/hyp/nvhe/hyp-main.c | 937 +++++++++++++- arch/arm64/kvm/hyp/nvhe/hyp-smp.c | 4 + arch/arm64/kvm/hyp/nvhe/mem_protect.c | 1035 +++++++++++++++- arch/arm64/kvm/hyp/nvhe/mm.c | 177 ++- arch/arm64/kvm/hyp/nvhe/page_alloc.c | 42 +- arch/arm64/kvm/hyp/nvhe/pkvm.c | 1095 ++++++++++++++++- arch/arm64/kvm/hyp/nvhe/setup.c | 97 +- arch/arm64/kvm/hyp/nvhe/switch.c | 9 +- arch/arm64/kvm/hyp/nvhe/sys_regs.c | 139 ++- arch/arm64/kvm/hyp/nvhe/tlb.c | 96 +- arch/arm64/kvm/hyp/pgtable.c | 31 +- arch/arm64/kvm/hyp/vgic-v3-sr.c | 27 +- arch/arm64/kvm/mmio.c | 9 + arch/arm64/kvm/mmu.c | 202 ++- arch/arm64/kvm/pkvm.c | 156 ++- arch/arm64/kvm/pmu.c | 16 +- arch/arm64/kvm/psci.c | 28 - arch/arm64/kvm/reset.c | 99 +- arch/arm64/kvm/sys_regs.c | 34 +- arch/arm64/kvm/sys_regs.h | 19 + arch/arm64/kvm/vgic/vgic-v2.c | 9 +- arch/arm64/kvm/vgic/vgic-v3.c | 28 +- arch/arm64/kvm/vgic/vgic.c | 17 +- arch/arm64/kvm/vgic/vgic.h | 6 +- arch/arm64/mm/fault.c | 22 + include/kvm/arm_vgic.h | 3 +- include/linux/arm-smccc.h | 21 + include/uapi/linux/kvm.h | 6 + 59 files changed, 5128 insertions(+), 817 deletions(-) create mode 100644 Documentation/virt/kvm/arm/hypercalls.rst create mode 100644 Documentation/virt/kvm/arm/pkvm.rst delete mode 100644 arch/arm64/kvm/hyp/include/nvhe/fixed_config.h create mode 100644 arch/arm64/kvm/hyp/include/nvhe/pkvm.h -- 2.36.1.124.g0e6072fb45-goog