On 2024-03-08 at 12:58:38 -0800, Isaku Yamahata wrote: > On Thu, Mar 07, 2024 at 04:32:16PM +0800, > Chen Yu <yu.c.chen@xxxxxxxxx> wrote: > > > On 2024-02-26 at 00:26:22 -0800, isaku.yamahata@xxxxxxxxx wrote: > > > From: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> > > > > > > On exiting from the guest TD, xsave state is clobbered. Restore xsave > > > state on TD exit. > > > > > > Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> > > > --- > > > v19: > > > - Add EXPORT_SYMBOL_GPL(host_xcr0) > > > > > > v15 -> v16: > > > - Added CET flag mask > > > > > > Signed-off-by: Isaku Yamahata <isaku.yamahata@xxxxxxxxx> > > > --- > > > arch/x86/kvm/vmx/tdx.c | 19 +++++++++++++++++++ > > > arch/x86/kvm/x86.c | 1 + > > > 2 files changed, 20 insertions(+) > > > > > > diff --git a/arch/x86/kvm/vmx/tdx.c b/arch/x86/kvm/vmx/tdx.c > > > index 9616b1aab6ce..199226c6cf55 100644 > > > --- a/arch/x86/kvm/vmx/tdx.c > > > +++ b/arch/x86/kvm/vmx/tdx.c > > > @@ -2,6 +2,7 @@ > > > #include <linux/cpu.h> > > > #include <linux/mmu_context.h> > > > > > > +#include <asm/fpu/xcr.h> > > > #include <asm/tdx.h> > > > > > > #include "capabilities.h" > > > @@ -534,6 +535,23 @@ void tdx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event) > > > */ > > > } > > > > > > +static void tdx_restore_host_xsave_state(struct kvm_vcpu *vcpu) > > > +{ > > > + struct kvm_tdx *kvm_tdx = to_kvm_tdx(vcpu->kvm); > > > + > > > + if (static_cpu_has(X86_FEATURE_XSAVE) && > > > + host_xcr0 != (kvm_tdx->xfam & kvm_caps.supported_xcr0)) > > > + xsetbv(XCR_XFEATURE_ENABLED_MASK, host_xcr0); > > > + if (static_cpu_has(X86_FEATURE_XSAVES) && > > > + /* PT can be exposed to TD guest regardless of KVM's XSS support */ > > > + host_xss != (kvm_tdx->xfam & > > > + (kvm_caps.supported_xss | XFEATURE_MASK_PT | TDX_TD_XFAM_CET))) > > > + wrmsrl(MSR_IA32_XSS, host_xss); > > > + if (static_cpu_has(X86_FEATURE_PKU) && > > > + (kvm_tdx->xfam & XFEATURE_MASK_PKRU)) > > > + write_pkru(vcpu->arch.host_pkru); > > > +} > > > > Maybe one minor question regarding the pkru restore. In the non-TDX version > > kvm_load_host_xsave_state(), it first tries to read the current setting > > vcpu->arch.pkru = rdpkru(); if this setting does not equal to host_pkru, > > it trigger the write_pkru on host. Does it mean we can also leverage that mechanism > > in TDX to avoid 1 pkru write(I guess pkru write is costly than a read pkru)? > > Yes, that's the intention. When we set the PKRU feature for the guest, TDX > module unconditionally initialize pkru. I see, thanks for the information. Please correct me if I'm wrong, and I'm not sure if wrpkru instruction would trigger the TD exit. The TDX module spec[1] mentioned PKS (protected key for supervisor pages), but does not metion PKU for user pages. PKS is controlled by MSR IA32_PKRS. The TDX module will passthrough the MSR IA32_PKRS write in TD, because TDX module clears the PKS bitmap in VMCS: https://github.com/intel/tdx-module/blob/tdx_1.5/src/common/helpers/helpers.c#L1723 so neither write to MSR IA32_PKRS nor wrpkru triggers TD exit. However, after a second thought, I found that after commit 72a6c08c44e4, the current code should not be a problem, because write_pkru() would first read the current pkru settings and decide whether to update to the pkru register. > Do you have use case that wrpkru() > (without rdpkru()) is better? I don't have use case yet. But with/without rdpkru() in tdx_restore_host_xsave_state(), there is no much difference because write_pkru() has taken care of it if I understand correctly. thanks, Chenyu