Re: [PATCH v3 00/21] Enable CET Virtualization

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 19 Jul 2023 12:41:47 -0700

On Mon, Jul 17, 2023, Weijiang Yang wrote:
> 
> On 6/24/2023 4:51 AM, Sean Christopherson wrote:
> > > 1)Add Supervisor Shadow Stackï¿½ state support(i.e., XSS.bit12(CET_S)) into
> > > kernel so that host can support guest Supervisor Shadow Stack MSRs in g/h FPU
> > > context switch.
> > If that's necessary for correct functionality, yes.

...

> the Pros:
> ï¿½- Super easy to implement for KVM.
> ï¿½- Automatically avoids saving and restoring this data when the vmexit
> ï¿½ï¿½ is handled within KVM.
> 
> the Cons:
> ï¿½- Unnecessarily restores XFEATURE_CET_KERNEL when switching to
> ï¿½ï¿½ non-KVM task's userspace.
> ï¿½- Forces allocating space for this state on all tasks, whether or not
> ï¿½ï¿½ they use KVM, and with likely zero users today and the near future.
> ï¿½- Complicates the FPU optimization thinking by including things that
> ï¿½ï¿½ can have no affect on userspace in the FPU
> 
> Given above reasons, I implemented guest CET supervisor states management
> in KVM instead of adding a kernel patch for it.
> 
> Below are 3 KVM patches to support it:
> 
> Patch 1: Save/reload guest CET supervisor states when necessary:
> 
> =======================================================================
> 
> commit 16147ede75dee29583b7d42a6621d10d55b63595
> Author: Yang Weijiang <weijiang.yang@xxxxxxxxx>
> Date:ï¿½ï¿½ Tue Jul 11 02:26:17 2023 -0400
> 
> ï¿½ï¿½ï¿½ KVM:x86: Make guest supervisor states as non-XSAVE managed
> 
> ï¿½ï¿½ï¿½ Save and reload guest CET supervisor states, i.e.,PL{0,1,2}_SSP,
> ï¿½ï¿½ï¿½ when vCPU context is being swapped before and after userspace
> ï¿½ï¿½ï¿½ <->kernel entry, also do the same operation when vCPU is sched-in
> ï¿½ï¿½ï¿½ or sched-out.

...

> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> index e2c549f147a5..7d9cfb7e2fe8 100644
> --- a/arch/x86/kvm/x86.c
> +++ b/arch/x86/kvm/x86.c
> @@ -11212,6 +11212,31 @@ static void kvm_put_guest_fpu(struct kvm_vcpu
> *vcpu)
> ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ trace_kvm_fpu(0);
> ï¿½}
> 
> +static void kvm_save_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
> +{
> +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ preempt_disable();
> +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
> +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ rdmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
> +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ rdmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
> +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ rdmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
> +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ wrmsrl(MSR_IA32_PL0_SSP, 0);
> +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ wrmsrl(MSR_IA32_PL1_SSP, 0);
> +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ wrmsrl(MSR_IA32_PL2_SSP, 0);
> +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ }
> +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ preempt_enable();
> +}
> +
> +static void kvm_reload_cet_supervisor_ssp(struct kvm_vcpu *vcpu)
> +{
> +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ preempt_disable();
> +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ if (unlikely(guest_can_use(vcpu, X86_FEATURE_SHSTK))) {
> +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ wrmsrl(MSR_IA32_PL0_SSP, vcpu->arch.cet_s_ssp[0]);
> +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ wrmsrl(MSR_IA32_PL1_SSP, vcpu->arch.cet_s_ssp[1]);
> +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ wrmsrl(MSR_IA32_PL2_SSP, vcpu->arch.cet_s_ssp[2]);
> +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ }
> +ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ preempt_enable();
> +}

My understanding is that PL[0-2]_SSP are used only on transitions to the
corresponding privilege level from a *different* privilege level.  That means
KVM should be able to utilize the user_return_msr framework to load the host
values.  Though if Linux ever supports SSS, I'm guessing the core kernel will
have some sort of mechanism to defer loading MSR_IA32_PL0_SSP until an exit to
userspace, e.g. to avoid having to write PL0_SSP, which will presumably be
per-task, on every context switch.

But note my original wording: **If that's necessary**

If nothing in the host ever consumes those MSRs, i.e. if SSS is NOT enabled in
IA32_S_CET, then running host stuff with guest values should be ok.  KVM only
needs to guarantee that it doesn't leak values between guests.  But that should
Just Work, e.g. KVM should load the new vCPU's values if SHSTK is exposed to the
guest, and intercept (to inject #GP) if SHSTK is not exposed to the guest.

And regardless of what the mechanism ends up managing SSP MSRs, it should only
ever touch PL0_SSP, because Linux never runs anything at CPL1 or CPL2, i.e. will
never consume PL{1,2}_SSP.

Am I missing something?