Re: [PATCH 4/7] KVM: TDX: restore host xsave state when exit from the guest TD

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 25, 2024, Adrian Hunter wrote:
> On 22/11/24 07:49, Chao Gao wrote:
> >> +static void tdx_restore_host_xsave_state(struct kvm_vcpu *vcpu)
> >> +{
> >> +	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
> >> +
> >> +	if (static_cpu_has(X86_FEATURE_XSAVE) &&
> >> +	    kvm_host.xcr0 != (kvm_tdx->xfam & kvm_caps.supported_xcr0))
> >> +		xsetbv(XCR_XFEATURE_ENABLED_MASK, kvm_host.xcr0);
> >> +	if (static_cpu_has(X86_FEATURE_XSAVES) &&
> >> +	    /* PT can be exposed to TD guest regardless of KVM's XSS support */
> >> +	    kvm_host.xss != (kvm_tdx->xfam &
> >> +			 (kvm_caps.supported_xss | XFEATURE_MASK_PT |
> >> +			  XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL)))
> > 
> > Should we drop CET/PT from this series? I think they are worth a new
> > patch/series.
> 
> This is not really about CET/PT
> 
> What is happening here is that we are calculating the current
> MSR_IA32_XSS value based on the TDX Module spec which says the
> TDX Module sets MSR_IA32_XSS to the XSS bits from XFAM.  The
> TDX Module does that literally, from TDX Module source code:
> 
> 	#define XCR0_SUPERVISOR_BIT_MASK            0x0001FD00
> and
> 	ia32_wrmsr(IA32_XSS_MSR_ADDR, xfam & XCR0_SUPERVISOR_BIT_MASK);
> 
> For KVM, rather than:
> 
> 			kvm_tdx->xfam &
> 			 (kvm_caps.supported_xss | XFEATURE_MASK_PT |
> 			  XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL)
> 
> it would be more direct to define the bits and enforce them
> via tdx_get_supported_xfam() e.g.
> 
> /* 
>  * Before returning from TDH.VP.ENTER, the TDX Module assigns:
>  *   XCR0 to the TD’s user-mode feature bits of XFAM (bits 7:0, 9)
>  *   IA32_XSS to the TD's supervisor-mode feature bits of XFAM (bits 8, 16:10)
>  */
> #define TDX_XFAM_XCR0_MASK	(GENMASK(7, 0) | BIT(9))
> #define TDX_XFAM_XSS_MASK	(GENMASK(16, 10) | BIT(8))
> #define TDX_XFAM_MASK		(TDX_XFAM_XCR0_MASK | TDX_XFAM_XSS_MASK)
> 
> static u64 tdx_get_supported_xfam(const struct tdx_sys_info_td_conf *td_conf)
> {
> 	u64 val = kvm_caps.supported_xcr0 | kvm_caps.supported_xss;
> 
> 	/* Ensure features are in the masks */
> 	val &= TDX_XFAM_MASK;
> 
> 	if ((val & td_conf->xfam_fixed1) != td_conf->xfam_fixed1)
> 		return 0;
> 
> 	val &= td_conf->xfam_fixed0;
> 
> 	return val;
> }
> 
> and then:
> 
> 	if (static_cpu_has(X86_FEATURE_XSAVE) &&
> 	    kvm_host.xcr0 != (kvm_tdx->xfam & TDX_XFAM_XCR0_MASK))
> 		xsetbv(XCR_XFEATURE_ENABLED_MASK, kvm_host.xcr0);
> 	if (static_cpu_has(X86_FEATURE_XSAVES) &&
> 	    kvm_host.xss != (kvm_tdx->xfam & TDX_XFAM_XSS_MASK))
> 		wrmsrl(MSR_IA32_XSS, kvm_host.xss);
> 
> > 
> >> +		wrmsrl(MSR_IA32_XSS, kvm_host.xss);
> >> +	if (static_cpu_has(X86_FEATURE_PKU) &&
> > 
> > How about using cpu_feature_enabled()? It is used in kvm_load_host_xsave_state()
> > It handles the case where CONFIG_X86_INTEL_MEMORY_PROTECTION_KEYS is not
> > enabled.

I would rather just use kvm_load_host_xsave_state(), by forcing vcpu->arch.{xcr0,xss}
to XFAM, with a comment explaining that the TDX module sets XCR0 and XSS prior to
returning from VP.ENTER.  I don't see any justificaton for maintaining a special
flow for TDX, it's just more work.  E.g. TDX is missing the optimization to elide
WRPKRU if the current value is the same as the host value.

KVM already disallows emulating a WRMSR to XSS via the tdx_has_emulated_msr()
check, and AFAICT there's no path for the guest to set KVM's view of XCR0, CR0,
or CR4, so I'm pretty sure stuffing state at vCPU creation is all that's needed.

That said, out of paranoia, KVM should disallow the guest from changing XSS if
guest state is protected, i.e. in common code, as XSS is a mandatory passthrough
for SEV-ES+, i.e. XSS is fully guest-owned for both TDX and ES+.

Ditto for CR0 and CR4 (TDX only; SEV-ES+ lets the host see the guest values).
The current TDX code lets KVM read CR0 and CR4, but KVM will always see the RESET
values, which are completely wrong for TDX.  KVM can obviously "work" without a
sane view of guest CR0/CR4, but I think having a sane view will yield code that
is easier to maintain and understand, because almost all special casing will be
in TDX's initialization flow, not spread out wherever KVM needs to know that what
KVM sees in guest state is a lie.

The guest_state_protected check in kvm_load_host_xsave_state() needs to be moved
to svm_vcpu_run(), but IMO that's where the checks belong anyways, because not
restoring host state for protected guests is obviously specific to SEV-ES+ guests,
not to all protected guests.

Side topic, tdx_cache_reg() is ridiculous.  Just mark the "cached" registers as
available on exit.  Invoking a callback just to do nothing is a complete waste.
I'm also not convinced letting KVM read garbage for RIP, RSP, CR3, or PDPTRs is
at all reasonable.  CR3 and PDPTRs should be unreachable, and I gotta imagine the
same holds true for RSP.  Allow reads/writes to RIP is fine, in that it probably
simplifies the overall code.

Something like this (probably won't apply, I have other local hacks as the result
of suggestions).

---
 arch/x86/kvm/svm/svm.c     |  7 ++++--
 arch/x86/kvm/vmx/main.c    |  4 +--
 arch/x86/kvm/vmx/tdx.c     | 50 ++++++++++----------------------------
 arch/x86/kvm/vmx/x86_ops.h |  4 ---
 arch/x86/kvm/x86.c         | 15 +++++++-----
 5 files changed, 28 insertions(+), 52 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index dd15cc635655..63df43e5dcce 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -4251,7 +4251,9 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
 		svm_set_dr6(svm, DR6_ACTIVE_LOW);
 
 	clgi();
-	kvm_load_guest_xsave_state(vcpu);
+
+	if (!vcpu->arch.guest_state_protected)
+		kvm_load_guest_xsave_state(vcpu);
 
 	kvm_wait_lapic_expire(vcpu);
 
@@ -4280,7 +4282,8 @@ static __no_kcsan fastpath_t svm_vcpu_run(struct kvm_vcpu *vcpu,
 	if (unlikely(svm->vmcb->control.exit_code == SVM_EXIT_NMI))
 		kvm_before_interrupt(vcpu, KVM_HANDLING_NMI);
 
-	kvm_load_host_xsave_state(vcpu);
+	if (!vcpu->arch.guest_state_protected)
+		kvm_load_host_xsave_state(vcpu);
 	stgi();
 
 	/* Any pending NMI will happen here */
diff --git a/arch/x86/kvm/vmx/main.c b/arch/x86/kvm/vmx/main.c
index 2742f2af7f55..d2e78e6675b9 100644
--- a/arch/x86/kvm/vmx/main.c
+++ b/arch/x86/kvm/vmx/main.c
@@ -520,10 +520,8 @@ static void vt_sync_dirty_debug_regs(struct kvm_vcpu *vcpu)
 
 static void vt_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
 {
-	if (is_td_vcpu(vcpu)) {
-		tdx_cache_reg(vcpu, reg);
+	if (WARN_ON_ONCE(is_td_vcpu(vcpu)))
 		return;
-	}
 
 	vmx_cache_reg(vcpu, reg);
 }
diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c
index 7eff717c9d0d..b49dcf32206b 100644
--- a/arch/x86/kvm/vmx/tdx.c
+++ b/arch/x86/kvm/vmx/tdx.c
@@ -636,6 +636,9 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 	vcpu->arch.cr0_guest_owned_bits = -1ul;
 	vcpu->arch.cr4_guest_owned_bits = -1ul;
 
+	vcpu->arch.cr4 = <maximal value>;
+	vcpu->arch.cr0 = <maximal value, give or take>;
+
 	vcpu->arch.tsc_offset = kvm_tdx->tsc_offset;
 	vcpu->arch.l1_tsc_offset = vcpu->arch.tsc_offset;
 	/*
@@ -659,6 +662,14 @@ int tdx_vcpu_create(struct kvm_vcpu *vcpu)
 
 	tdx->state = VCPU_TD_STATE_UNINITIALIZED;
 
+	/*
+	 * On return from VP.ENTER, the TDX Module sets XCR0 and XSS to the
+	 * maximal values supported by the guest, so from KVM's perspective,
+	 * those are the guest's values at all times.
+	 */
+	vcpu->arch.ia32_xss = (kvm_tdx->xfam & XFEATURE_SUPERVISOR_MASK);
+	vcpu->arch.xcr0 = (kvm_tdx->xfam & XFEATURE_USE_MASK);
+
 	return 0;
 }
 
@@ -824,24 +835,6 @@ static void tdx_user_return_msr_update_cache(struct kvm_vcpu *vcpu)
 		kvm_user_return_msr_update_cache(tdx_uret_tsx_ctrl_slot, 0);
 }
 
-static void tdx_restore_host_xsave_state(struct kvm_vcpu *vcpu)
-{
-	struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm);
-
-	if (static_cpu_has(X86_FEATURE_XSAVE) &&
-	    kvm_host.xcr0 != (kvm_tdx->xfam & kvm_caps.supported_xcr0))
-		xsetbv(XCR_XFEATURE_ENABLED_MASK, kvm_host.xcr0);
-	if (static_cpu_has(X86_FEATURE_XSAVES) &&
-	    /* PT can be exposed to TD guest regardless of KVM's XSS support */
-	    kvm_host.xss != (kvm_tdx->xfam &
-			 (kvm_caps.supported_xss | XFEATURE_MASK_PT |
-			  XFEATURE_MASK_CET_USER | XFEATURE_MASK_CET_KERNEL)))
-		wrmsrl(MSR_IA32_XSS, kvm_host.xss);
-	if (static_cpu_has(X86_FEATURE_PKU) &&
-	    (kvm_tdx->xfam & XFEATURE_MASK_PKRU))
-		write_pkru(vcpu->arch.host_pkru);
-}
-
 static union vmx_exit_reason tdx_to_vmx_exit_reason(struct kvm_vcpu *vcpu)
 {
 	struct vcpu_tdx *tdx = to_tdx(vcpu);
@@ -941,10 +934,10 @@ fastpath_t tdx_vcpu_run(struct kvm_vcpu *vcpu, bool force_immediate_exit)
 	tdx_vcpu_enter_exit(vcpu);
 
 	tdx_user_return_msr_update_cache(vcpu);
-	tdx_restore_host_xsave_state(vcpu);
+	kvm_load_host_xsave_state(vcpu);
 	tdx->host_state_need_restore = true;
 
-	vcpu->arch.regs_avail &= ~VMX_REGS_LAZY_LOAD_SET;
+	vcpu->arch.regs_avail = TDX_REGS_UNSUPPORTED_SET;
 
 	if (unlikely((tdx->vp_enter_ret & TDX_SW_ERROR) == TDX_SW_ERROR))
 		return EXIT_FASTPATH_NONE;
@@ -1963,23 +1956,6 @@ int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr)
 	}
 }
 
-void tdx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg)
-{
-	kvm_register_mark_available(vcpu, reg);
-	switch (reg) {
-	case VCPU_REGS_RSP:
-	case VCPU_REGS_RIP:
-	case VCPU_EXREG_PDPTR:
-	case VCPU_EXREG_CR0:
-	case VCPU_EXREG_CR3:
-	case VCPU_EXREG_CR4:
-		break;
-	default:
-		KVM_BUG_ON(1, vcpu->kvm);
-		break;
-	}
-}
-
 static int tdx_get_capabilities(struct kvm_tdx_cmd *cmd)
 {
 	const struct tdx_sys_info_td_conf *td_conf = &tdx_sysinfo->td_conf;
diff --git a/arch/x86/kvm/vmx/x86_ops.h b/arch/x86/kvm/vmx/x86_ops.h
index ef60eb7b1245..efa6723837c6 100644
--- a/arch/x86/kvm/vmx/x86_ops.h
+++ b/arch/x86/kvm/vmx/x86_ops.h
@@ -145,8 +145,6 @@ bool tdx_has_emulated_msr(u32 index);
 int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
 int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr);
 
-void tdx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg);
-
 int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp);
 
 int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn,
@@ -193,8 +191,6 @@ static inline bool tdx_has_emulated_msr(u32 index) { return false; }
 static inline int tdx_get_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) { return 1; }
 static inline int tdx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr) { return 1; }
 
-static inline void tdx_cache_reg(struct kvm_vcpu *vcpu, enum kvm_reg reg) {}
-
 static inline int tdx_vcpu_ioctl(struct kvm_vcpu *vcpu, void __user *argp) { return -EOPNOTSUPP; }
 
 static inline int tdx_sept_link_private_spt(struct kvm *kvm, gfn_t gfn,
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 4f94b1e24eae..d380837433c6 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1184,11 +1184,9 @@ EXPORT_SYMBOL_GPL(kvm_lmsw);
 
 void kvm_load_guest_xsave_state(struct kvm_vcpu *vcpu)
 {
-	if (vcpu->arch.guest_state_protected)
-		return;
+	WARN_ON_ONCE(vcpu->arch.guest_state_protected);
 
 	if (kvm_is_cr4_bit_set(vcpu, X86_CR4_OSXSAVE)) {
-
 		if (vcpu->arch.xcr0 != kvm_host.xcr0)
 			xsetbv(XCR_XFEATURE_ENABLED_MASK, vcpu->arch.xcr0);
 
@@ -1207,9 +1205,6 @@ EXPORT_SYMBOL_GPL(kvm_load_guest_xsave_state);
 
 void kvm_load_host_xsave_state(struct kvm_vcpu *vcpu)
 {
-	if (vcpu->arch.guest_state_protected)
-		return;
-
 	if (cpu_feature_enabled(X86_FEATURE_PKU) &&
 	    ((vcpu->arch.xcr0 & XFEATURE_MASK_PKRU) ||
 	     kvm_is_cr4_bit_set(vcpu, X86_CR4_PKE))) {
@@ -3943,6 +3938,10 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (!msr_info->host_initiated &&
 		    !guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
 			return 1;
+
+		if (vcpu->arch.guest_state_protected)
+			return 1;
+
 		/*
 		 * KVM supports exposing PT to the guest, but does not support
 		 * IA32_XSS[bit 8]. Guests have to use RDMSR/WRMSR rather than
@@ -4402,6 +4401,10 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
 		if (!msr_info->host_initiated &&
 		    !guest_cpuid_has(vcpu, X86_FEATURE_XSAVES))
 			return 1;
+
+		if (vcpu->arch.guest_state_protected)
+			return 1;
+
 		msr_info->data = vcpu->arch.ia32_xss;
 		break;
 	case MSR_K7_CLK_CTL:

base-commit: 0319082fc23089f516618e193d94da18c837e35a
-- 





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux