Hi, This patch series adds the support for EPT violation/misconfig handling and several TDVMCALL leaves, adds a bunch of wrappers to ignore the operations not supported by TDX guests, and the document. This patch series is the last part needed to provide the ability to run a functioning TD VM. We think this is in pretty good shape at this point and ready for handoff to Paolo. Base of this series =================== This series is based on kvm-coco-queue up to the end of "TDX interrupts", plus one PAT quirk series. Stack is: - '31db5921f12d ("KVM: TDX: Handle EXIT_REASON_OTHER_SMI")' from kvm-coco-queue. - PAT quirk series "KVM: x86: Introduce quirk KVM_X86_QUIRK_EPT_IGNORE_GUEST_PAT" [0]. Notable changes since v1 [1] ============================ Patch "KVM: x86: Add a switch_db_regs flag to handle TDX's auto-switched behavior" is moved to "KVM: TDX: TD vcpu enter/exit" [2]. Rebased after adding tdcall_to_vmx_exit_reason() in [3] and the way to get exit_qualification, ext_exit_qualification. For EPT MISCONFIG, bug the VM and return -EIO. The handling is deferred until tdx_handle_exit() because tdx_to_vmx_exit_reason() is called by 'noinstr' code with interrupt disabled. Add SEPT local retry and wait for SEPT zap logic to provide a clean solution to avoid the blind SEPT retries. Morph the following guest requested exit reasons (via TDVMCALL) to KVM's tracked exit reasons: - Morph PV CPUID to EXIT_REASON_CPUID - Morph PV HLT to EXIT_REASON_HLT - Morph PV RDMSR to EXIT_REASON_RDMSR - Morph PV WRMSR to EXIT_REASON_WRMSR Check RVI pending (bit 0 of TD_VCPU_STATE_DETAILS_NON_ARCH field) only for HALTED case with IRQ enabled in tdx_protected_apic_has_interrupt(). For PV RDMSR/WRMSR handling, marshall values to the appropriate x86 registers to leverage the existing kvm_emulate_{rdmsr,wrmsr}(), and implement complete_emulated_msr() callback to set return value/code to vp_enter_args. Skip setting of return code when the value is TDVMCALL_STATUS_SUCCESS because r10 is always 0 for standard TDVMCALL exit. Get/set tdvmcall inputs/outputs from/to vp_enter_args directly in struct vcpu_tdx. After dropping helpers for read/write a0~a3 in [3]. Added back MTRR MSRs access, but drop the special handling for TDX guests, just align with what KVM does for normal VMs. Dropped tdx_cache_reg(). Updated documents. TODO ==== Macrofy vt_x86_ops callbacks suggested by Sean. [4] Overview ======== EPT violation ------------- EPT violation for TDX will trigger X86 MMU code. Note that instruction fetch from shared memory is not allowed for TDX guests, if it occurs, treat it as broken hardware, bug the VM and return error. (*New Updated*) SEPT local retry and wait for SEPT zap logic provides a clean solution to avoid the blind SEPT retries. EPT misconfiguration -------------------- EPT misconfiguration shouldn't happen for TDX guests. If it occurs, bug the VM and return error. TDVMCALL support ---------------- Supports are added to allow TDX guests to issue CPUID, HLT, RDMSR/WRMSR and GetTdVmCallInfo via TDVMCALL. - CPUID For TDX, most CPUID leaf/sub-leaf combinations are virtualized by the TDX module while some trigger #VE. On #VE, TDX guest can issue a TDVMCALL with the leaf Instruction.CPUID to request VMM to emulate CPUID operation. - HLT TDX guest can issue a TDVMCALL with HLT, which passes the interrupt blocked flag. Whether the interrupt is allowed or not is depending on the interrupt blocked flag. For NMI, KVM can't get the NMI blocked status of TDX guest, it always assumes NMI is allowed. - MSRs Some MSRs are virtualized by TDX module directly, while some MSRs will trigger #VE when guest accesses them. On #VE, TDX guests can issue a TDVMCALL with WRMSR or RDMSR to request emulation in VMM. Operations ignored ------------------ TDX protects TDX guest state from VMM, and some features are not supported by TDX guest, a bunch of operations are ignored for TDX guests, including: accesses to CPU state, VMX preemption timer, accesses to TSC offset and multiplier, setup MCE for LMCE enable/disable, and hypercall patching. Repos ===== Due to "KVM: VMX: Move common fields of struct" in "TDX vcpu enter/exit" v2 [2], subsequent patches require changes to use new struct vcpu_vt, refer to the full KVM branch below. It requires TDX module 1.5.06.00.0744 [4], or later as mentioned in [2]. A working edk2 commit is 95d8a1c ("UnitTestFrameworkPkg: Use TianoCore mirror of subhook submodule"). The full KVM branch is here: https://github.com/intel/tdx/tree/tdx_kvm_dev-2025-02-26 A matching QEMU is here: https://github.com/intel-staging/qemu-tdx/tree/tdx-qemu-wip-2025-02-18 Testing ======= It has been tested as part of the development branch for the TDX base series. The testing consisted of TDX kvm-unit-tests and booting a Linux TD, and TDX enhanced KVM selftests. It also passed the TDX related test cases defined in the LKVS test suite as described in: https://github.com/intel/lkvs/blob/main/KVM/docs/lkvs_on_avocado.md [0] https://lore.kernel.org/kvm/20250224070716.31360-1-yan.y.zhao@xxxxxxxxx [1] https://lore.kernel.org/kvm/20241210004946.3718496-1-binbin.wu@xxxxxxxxxxxxxxx [2] https://lore.kernel.org/kvm/20250129095902.16391-1-adrian.hunter@xxxxxxxxx [3] https://lore.kernel.org/kvm/20250222014225.897298-1-binbin.wu@xxxxxxxxxxxxxxx [4] https://lore.kernel.org/kvm/Z6v9yjWLNTU6X90d@xxxxxxxxxx [5] https://github.com/intel/tdx-module/releases/tag/TDX_1.5.06 Binbin Wu (1): KVM: TDX: Enable guest access to MTRR MSRs Isaku Yamahata (16): KVM: TDX: Handle EPT violation/misconfig exit KVM: TDX: Handle TDX PV CPUID hypercall KVM: TDX: Handle TDX PV HLT hypercall KVM: x86: Move KVM_MAX_MCE_BANKS to header file KVM: TDX: Implement callbacks for MSR operations KVM: TDX: Handle TDX PV rdmsr/wrmsr hypercall KVM: TDX: Enable guest access to LMCE related MSRs KVM: TDX: Handle TDG.VP.VMCALL<GetTdVmCallInfo> hypercall KVM: TDX: Add methods to ignore accesses to CPU state KVM: TDX: Add method to ignore guest instruction emulation KVM: TDX: Add methods to ignore VMX preemption timer KVM: TDX: Add methods to ignore accesses to TSC KVM: TDX: Ignore setting up mce KVM: TDX: Add a method to ignore hypercall patching KVM: TDX: Make TDX VM type supported Documentation/virt/kvm: Document on Trust Domain Extensions (TDX) Yan Zhao (3): KVM: TDX: Detect unexpected SEPT violations due to pending SPTEs KVM: TDX: Retry locally in TDX EPT violation handler on RET_PF_RETRY KVM: TDX: Kick off vCPUs when SEAMCALL is busy during TD page removal Documentation/virt/kvm/api.rst | 13 +- Documentation/virt/kvm/x86/index.rst | 1 + Documentation/virt/kvm/x86/intel-tdx.rst | 255 ++++++++++++ arch/x86/include/asm/shared/tdx.h | 1 + arch/x86/include/asm/vmx.h | 2 + arch/x86/kvm/vmx/main.c | 482 ++++++++++++++++++++--- arch/x86/kvm/vmx/posted_intr.c | 3 +- arch/x86/kvm/vmx/tdx.c | 381 +++++++++++++++++- arch/x86/kvm/vmx/tdx.h | 16 + arch/x86/kvm/vmx/tdx_arch.h | 13 + arch/x86/kvm/vmx/x86_ops.h | 6 + arch/x86/kvm/x86.c | 1 - arch/x86/kvm/x86.h | 2 + 13 files changed, 1113 insertions(+), 63 deletions(-) create mode 100644 Documentation/virt/kvm/x86/intel-tdx.rst -- 2.46.0