Hi This patch series introduces callbacks to facilitate the entry of a TD VCPU and the corresponding save/restore of host state. There are some outstanding things still to do (see below), so we expect to post future revisions of this patch set, but please do review the current patches so that they can be made ready for hand off to Paolo. Also direction is needed for "x86/virt/tdx: Add SEAMCALL wrapper to enter/exit TDX guest" because it will affect KVM. This patch set is one of several patch sets that are all needed to provide the ability to run a functioning TD VM. They have been split from the "marker" sections of patch set "[PATCH v19 000/130] KVM TDX basic feature support": https://lore.kernel.org/all/cover.1708933498.git.isaku.yamahata@xxxxxxxxx/ The recent patch sets are: TDX host: metadata reading TDX vCPU/VM creation TDX KVM MMU part 2 TD vcpu enter/exit <- this one TD vcpu exits/interrupts/hypercalls <- still to come Notably, a later patch sets deal with VCPU exits, interrupts and hypercalls. For x86 maintainers This series has 1 commit that is an RFC that needs input from x86 maintainers: x86/virt/tdx: Add SEAMCALL wrapper to enter/exit TDX guest This is because wrapping TDH.VP.ENTER means dealing with multiple input and output formats for the data in argument registers. We would like maintainers to comment on the discussion that we will have on it. Overview A TD VCPU is entered via the SEAMCALL TDH.VP.ENTER. The TDX Module manages the save/restore of guest state and, in conjunction with the SEAMCALL interface, handles certain aspects of host state. However, there are specific elements of the host state that require additional attention, as detailed in the Intel TDX ABI documentation for TDH.VP.ENTER. TDX is quite different from VMX in this regard. For VMX, the host VMM is heavily involved in restoring, managing and saving guest CPU state, whereas for TDX this is handled by the TDX Module. In that way, the TDX Module can protect the confidentiality and integrity of TD CPU state. The TDX Module does not save/restore all host CPU state because the host VMM can do it more efficiently and selectively. CPU state referred to below is host CPU state. Often values are already held in memory so no explicit save is needed, and restoration may not be needed if the kernel is not using a feature. Outstanding things still to do: - how to wrap TDH.VP.ENTER SEAMCALL, refer to patch "x86/virt/tdx: Add SEAMCALL wrapper to enter/exit TDX guest" - tdx_vcpu_enter_exit() calls guest_state_enter_irqoff() and guest_state_exit_irqoff() which comments say should be called from non-instrumentable code but noinst was removed at Sean's suggestion: https://lore.kernel.org/all/Zg8tJspL9uBmMZFO@xxxxxxxxxx/ noinstr is also needed to retain NMI-blocking by avoiding instrumented code that leads to an IRET which unblocks NMIs. A later patch set will deal with NMI VM-exits. - disallow TDX guest to use Intel PT I think Tony will fix tdx_get_supported_xfam() - disallow PERFMON (TD attribute bit 63) - save/restore MSR IA32_UMWAIT_CONTROL or disallow guest CPUID(7,0).ECX.WAITPKG[5] - save/restore IA32_DEBUGCTL VMX does: vmx_vcpu_load() -> get_debugctlmsr() vmx_vcpu_run() -> update_debugctlmsr() TDX Module only preserves bits 1, 12 and 14 Key Details Argument Passing: Similar to other SEAMCALLs, TDH.VP.ENTER passes arguments through General Purpose Registers (GPRs). For the special case of the TD guest invoking TDG.VP.VMCALL, nearly any GPR can be used, as well as XMM0 to XMM15. Notably, RBP is not used, and Linux mandates the TDX Module feature NO_RBP_MOD, which is enforced elsewhere. Additionally, XMM registers are not required for the existing Guest Hypervisor Communication Interface and are handled by existing KVM code should they be modified by the guest. Debug Register Handling: After TDH.VP.ENTER returns, registers DR0, DR1, DR2, DR3, DR6, and DR7 are set to their architectural INIT values. Existing KVM code already handles the restoration of host values as needed, refer vcpu_enter_guest() which calls hw_breakpoint_restore(). MSR Restoration: Certain Model-Specific Registers (MSRs) need to be restored post TDH.VP.ENTER. The Intel TDX ABI documentation provides a detailed list in the msr_preservation.json file. Most MSRs do not require restoration if the guest is not utilizing the corresponding feature. The following features are currently assumed to be unsupported, and their MSRs are not restored: PERFMON (TD ATTRIBUTES[63]) LBRs (XFAM[15]) User Interrupts (XFAM[14]) Intel PT (XFAM[8]) The one feature that is supported: CET (XFAM[11-12]) is restored via kvm_put_guest_fpu() Other host MSR/Register Handling: MSR IA32_XFD is already restored by KVM, refer to kvm_put_guest_fpu(). The TDX Module sets MSR IA32_XFD_ERR to its RESET value (0) which is fine for the kernel. MSR IA32_DEBUGCTL appears to have been overlooked. According to msr_preservation.json, the TDX Module preserves only bits 1, 12 and 14. For VMX there is code to save and restore in vmx_vcpu_load() and vmx_vcpu_run() respectively, but TDX does not use those functions. MSR IA32_UARCH_MISC_CTL is not utilized by the kernel, so it is fine if the TDX Module sets it to it's RESET value. MSR IA32_KERNEL_GS_BASE is addressed in patch "KVM: TDX: vcpu_run: save/restore host state (host kernel gs)". MSRs IA32_XSS and XCRO are handled in patch "KVM: TDX: restore host xsave state when exiting from the guest TD". MSRs IA32_STAR, IA32_LSTAR, IA32_FMASK, and IA32_TSC_AUX are handled in patch "KVM: TDX: restore user ret MSRs". MSR IA32_TSX_CTRL is handled in patch "KVM: TDX: Add TSX_CTRL msr into uret_msrs list". MSR IA32_UMWAIT_CONTROL appears to have been overlooked. The host value needs to be restored if guest CPUID(7,0).ECX.WAITPKG[5] is 1, otherwise that guest CPUID value needs to be disallowed. Additional Notes The patch "KVM: TDX: Implement TDX vcpu enter/exit path" highlights that TDX does not support "PAUSE-loop exiting". According to the TDX Module Base arch. spec., hypercalls are expected to be used instead. Note that the Linux TDX guest supports existing hypercalls via TDG.VP.VMCALL. Base This series is based off of a kvm-coco-queue commit and some pre-req series: 1. commit ee69eb746754 ("KVM: x86/mmu: Prevent aliased memslot GFNs") (in kvm-coco-queue). 2. v7 of "TDX host: metadata reading tweaks, bug fix and info dump" [1]. 3. v1 of "KVM: VMX: Initialize TDX when loading KVM module" [2], with some new feedback from Sean. 4. v2 of “TDX vCPU/VM creation” [3] 5. v2 of "TDX KVM MMU part 2" [4] It requires TDX module 1.5.06.00.0744[5], or later. This is due to removal of the workarounds for the lack of the NO_RBP_MOD feature required by the kernel. Now NO_RBP_MOD is enabled (in VM/vCPU creation patches), and this particular version of the TDX module has a required NO_RBP_MOD related bug fix. A working edk2 commit is 95d8a1c ("UnitTestFrameworkPkg: Use TianoCore mirror of subhook submodule"). Testing The series has been tested as part of the development branch for the TDX base series. The testing consisted of TDX kvm-unit-tests and booting a Linux TD, and TDX enhanced KVM selftests. The full KVM branch is here: https://github.com/intel/tdx/tree/tdx_kvm_dev-2024-11-20 Matching QEMU: https://github.com/intel-staging/qemu-tdx/commits/tdx-qemu-upstream-v6.1/ [0] https://lore.kernel.org/kvm/20240904030751.117579-1-rick.p.edgecombe@xxxxxxxxx/ [1] https://lore.kernel.org/kvm/cover.1731318868.git.kai.huang@xxxxxxxxx/#t [2] https://lore.kernel.org/kvm/cover.1730120881.git.kai.huang@xxxxxxxxx/ [3] https://lore.kernel.org/kvm/20241030190039.77971-1-rick.p.edgecombe@xxxxxxxxx/ [4] https://lore.kernel.org/kvm/20241112073327.21979-1-yan.y.zhao@xxxxxxxxx/ [5] https://github.com/intel/tdx-module/releases/tag/TDX_1.5.06 Chao Gao (1): KVM: x86: Allow to update cached values in kvm_user_return_msrs w/o wrmsr Isaku Yamahata (4): KVM: TDX: Implement TDX vcpu enter/exit path KVM: TDX: vcpu_run: save/restore host state(host kernel gs) KVM: TDX: restore host xsave state when exit from the guest TD KVM: TDX: restore user ret MSRs Kai Huang (1): x86/virt/tdx: Add SEAMCALL wrapper to enter/exit TDX guest Yang Weijiang (1): KVM: TDX: Add TSX_CTRL msr into uret_msrs list arch/x86/include/asm/kvm_host.h | 1 + arch/x86/include/asm/tdx.h | 1 + arch/x86/kvm/vmx/main.c | 45 ++++++++- arch/x86/kvm/vmx/tdx.c | 212 ++++++++++++++++++++++++++++++++++++++++ arch/x86/kvm/vmx/tdx.h | 14 +++ arch/x86/kvm/vmx/x86_ops.h | 9 ++ arch/x86/kvm/x86.c | 24 ++++- arch/x86/virt/vmx/tdx/tdx.c | 8 ++ arch/x86/virt/vmx/tdx/tdx.h | 1 + 9 files changed, 306 insertions(+), 9 deletions(-) Regards Adrian