This series introduces core KVM functionality necessary to emulate Hyper-V's Virtual Secure Mode in a Virtual Machine Monitor (VMM). Hyper-V's Virtual Secure Mode (VSM) is a virtualization security feature that leverages the hypervisor to create secure execution environments within a guest. VSM is documented as part of Microsoft's Hypervisor Top Level Functional Specification [1]. Security features that build upon VSM, like Windows Credential Guard, are enabled by default on Windows 11 and are becoming a prerequisite in some industries. VSM introduces the concept of Virtual Trust Levels (VTLs). These are independent execution contexts, each with its own CPU architectural state, local APIC state, and a different view of memory. They are hierarchical, with more privileged VTLs having priority over the execution of lower VTLs and control over lower VTLs' state. Windows leverages these low-level paravirtualized primitives, as well as the hypervisor's higher trust base, to prevent guest data exfiltration even when the operating system itself has been compromised. As discussed at LPC2023 and in our previous RFC [2], we decided to model each VTL as a distinct KVM VM. With this approach, and the RWX memory attributes introduced in this series, we have been able to implement VTL memory protections in a non-intrusive way, using generic KVM APIs. Additionally, each CPU's VTL is modeled as a distinct KVM vCPU, owned by the KVM VM tracking that VTL's state. VTL awareness is fully removed from KVM, and the responsibility for VTL-aware hypercalls, VTL scheduling, and state transfer is delegated to userspace. Series overview: - 1-8: Introduce a number of Hyper-V hyper-calls, all of which are VTL-aware and expected to be handled in userspace. Additionally an new VTL-specifc MP state is introduced. - 9-10: Pass the instruction length as part of the userspace fault exit data in order to simplify VSM's secure intercept generation. - 11-17: Introduce RWX memory attributes as well as extend userspace faults. - 18: Introduces the main VSM CPUID bit which gates all VTL configuration and runtime hypercalls. The series is accompanied by two repositories: - A PoC QEMU implementation of VSM [3]: This PoC VSM implementation is capable of booting Windows Server 2016 and 2019 with Credential Guard (CG) enabled on VMs of any size or vCPUs number. It's generally stable, but still sees its share of crashes. The PoC itself implements VSM interfaces to accommodate CG's needs, and it's by no means comprehensive. All in all, don't expect anything usable in production. - VSM kvm-unit-tests [4]: They cover all VSM hypercalls, as well as KVM APIs introduced by this series. But unfortunately depends on the QEMU implementation. We mostly tested on an Intel machine, both with and without TDP. Basic tests were also run on AMD (build and kvm-unit-tests). Please note that v2 will include KVM self-tests to close the testing gap, and allow merging this while we work on the userspace bits. The series is based on 'kvm/master', that is, commit db574f2f96d0, and also available in github [5]. This series also serves as a call-out to anyone interested in collaborating. We have a proven design, a working PoC, and hopefully a path forward to merge these KVM APIs. There is plenty to do in both QEMU and KVM still, I'll post a list of ideas in the future. Feel free to get in touch! Thanks, Nicolas [1] https://raw.githubusercontent.com/Microsoft/Virtualization-Documentation/master/tlfs/Hypervisor%20Top%20Level%20Functional%20Specification%20v6.0b.pdf [2] https://lore.kernel.org/lkml/20231108111806.92604-1-nsaenz@xxxxxxxxxx/ [3] https://github.com/vianpl/qemu/tree/vsm-v1 [4] https://github.com/vianpl/kvm-unit-tests/tree/vsm-v1 [4] https://github.com/vianpl/linux/tree/vsm-v1 --- Anish Moorthy (1): KVM: Define and communicate KVM_EXIT_MEMORY_FAULT RWX flags to userspace Nicolas Saenz Julienne (17): KVM: x86: hyper-v: Introduce XMM output support KVM: x86: hyper-v: Introduce helpers to check if VSM is exposed to guest hyperv-tlfs: Update struct hv_send_ipi{_ex}'s declarations KVM: x86: hyper-v: Introduce VTL awareness to Hyper-V's PV-IPIs KVM: x86: hyper-v: Introduce MP_STATE_HV_INACTIVE_VTL KVM: x86: hyper-v: Exit on Get/SetVpRegisters hcall KVM: x86: hyper-v: Exit on TranslateVirtualAddress hcall KVM: x86: hyper-v: Exit on StartVirtualProcessor and GetVpIndexFromApicId hcalls KVM: x86: Keep track of instruction length during faults KVM: x86: Pass the instruction length on memory fault user-space exits KVM: x86/mmu: Introduce infrastructure to handle non-executable mappings KVM: x86/mmu: Avoid warning when installing non-private memory attributes KVM: x86/mmu: Init memslot if memory attributes available KVM: Introduce RWX memory attributes KVM: x86: Take mem attributes into account when faulting memory KVM: Introduce traces to track memory attributes modification. KVM: x86: hyper-v: Handle VSM hcalls in user-space Documentation/virt/kvm/api.rst | 107 +++++++++++++++++++++++- arch/x86/hyperv/hv_apic.c | 3 +- arch/x86/include/asm/hyperv-tlfs.h | 2 +- arch/x86/kvm/Kconfig | 1 + arch/x86/kvm/hyperv.c | 127 +++++++++++++++++++++++++++-- arch/x86/kvm/hyperv.h | 18 ++++ arch/x86/kvm/mmu/mmu.c | 91 +++++++++++++++++---- arch/x86/kvm/mmu/mmu_internal.h | 9 +- arch/x86/kvm/mmu/mmutrace.h | 29 +++++++ arch/x86/kvm/mmu/paging_tmpl.h | 2 +- arch/x86/kvm/mmu/tdp_mmu.c | 8 +- arch/x86/kvm/svm/svm.c | 7 +- arch/x86/kvm/vmx/vmx.c | 23 +++++- arch/x86/kvm/x86.c | 17 +++- include/asm-generic/hyperv-tlfs.h | 16 +++- include/linux/kvm_host.h | 45 +++++++++- include/trace/events/kvm.h | 20 +++++ include/uapi/linux/kvm.h | 15 ++++ virt/kvm/kvm_main.c | 35 +++++++- 19 files changed, 527 insertions(+), 48 deletions(-) -- 2.40.1