> On May 12, 2022, at 1:44 PM, Jon Kohler <jon@xxxxxxxxxxx> wrote: > > Avoid expensive rdmsr on every VM Exit for MSR_IA32_SPEC_CTRL on > eIBRS enabled systems iff the guest only sets IA32_SPEC_CTRL[0] (IBRS) > and not [1] (STIBP) or [2] (SSBD) by not disabling interception in > the MSR bitmap. > > eIBRS enabled guests using just IBRS will only write SPEC_CTRL MSR > once or twice per vCPU on boot, so it is far better to take those > VM exits on boot than having to read and save this msr on every > single VM exit forever. This outcome was suggested on Andrea's commit > 2f46993d83ff ("x86: change default to spec_store_bypass_disable=prctl spectre_v2_user=prctl") > however, since interception is still unilaterally disabled, the rdmsr > tax is still there even after that commit. > > This is a significant win for eIBRS enabled systems as this rdmsr > accounts for roughly ~50% of time for vmx_vcpu_run() as observed > by perf top disassembly, and is in the critical path for all > VM-Exits, including fastpath exits. > > Update relevant comments in vmx_vcpu_run() with appropriate SDM > references for future onlookers. > Gentle ping on this one > Fixes: 2f46993d83ff ("x86: change default to spec_store_bypass_disable=prctl spectre_v2_user=prctl") > Signed-off-by: Jon Kohler <jon@xxxxxxxxxxx> > Cc: Andrea Arcangeli <aarcange@xxxxxxxxxx> > Cc: Kees Cook <keescook@xxxxxxxxxxxx> > Cc: Josh Poimboeuf <jpoimboe@xxxxxxxxxx> > Cc: Waiman Long <longman@xxxxxxxxxx> > --- > arch/x86/kvm/vmx/vmx.c | 46 +++++++++++++++++++++++++++++++----------- > 1 file changed, 34 insertions(+), 12 deletions(-) > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index d58b763df855..d9da6fcecd8c 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -2056,6 +2056,25 @@ static int vmx_set_msr(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > if (kvm_spec_ctrl_test_value(data)) > return 1; > > + /* > + * For Intel eIBRS, IBRS (SPEC_CTRL_IBRS aka 0x00000048 BIT(0)) > + * only needs to be set once and can be left on forever without > + * needing to be constantly toggled. If the guest attempts to > + * write that value, let's not disable interception. Guests > + * with eIBRS awareness should only be writing SPEC_CTRL_IBRS > + * once per vCPU per boot. > + * > + * The guest can still use other SPEC_CTRL features on top of > + * eIBRS such as SSBD, and we should disable interception for > + * those situations to avoid a multitude of VM-Exits's; > + * however, we will need to check SPEC_CTRL on each exit to > + * make sure we restore the host value properly. > + */ > + if (boot_cpu_has(X86_FEATURE_IBRS_ENHANCED) && data == BIT(0)) { > + vmx->spec_ctrl = data; > + break; > + } > + > vmx->spec_ctrl = data; > if (!data) > break; > @@ -6887,19 +6906,22 @@ static fastpath_t vmx_vcpu_run(struct kvm_vcpu *vcpu) > vmx_vcpu_enter_exit(vcpu, vmx); > > /* > - * We do not use IBRS in the kernel. If this vCPU has used the > - * SPEC_CTRL MSR it may have left it on; save the value and > - * turn it off. This is much more efficient than blindly adding > - * it to the atomic save/restore list. Especially as the former > - * (Saving guest MSRs on vmexit) doesn't even exist in KVM. > - * > - * For non-nested case: > - * If the L01 MSR bitmap does not intercept the MSR, then we need to > - * save it. > + * SDM 25.1.3 - handle conditional exit for MSR_IA32_SPEC_CTRL. > + * To prevent constant VM exits for SPEC_CTRL, kernel may > + * disable interception in the MSR bitmap for SPEC_CTRL MSR, > + * such that the guest can read and write to that MSR without > + * trapping to KVM; however, the guest may set a different > + * value than the host. For exit handling, do rdmsr below if > + * interception is disabled, such that we can save the guest > + * value for restore on VM entry, as it does not get saved > + * automatically per SDM 27.3.1. > * > - * For nested case: > - * If the L02 MSR bitmap does not intercept the MSR, then we need to > - * save it. > + * This behavior is optimized on eIBRS enabled systems, such > + * that the kernel only disables interception for MSR_IA32_SPEC_CTRL > + * when guests choose to use additional SPEC_CTRL features > + * above and beyond IBRS, such as STIBP or SSBD. This > + * optimization allows the kernel to avoid doing the expensive > + * rdmsr below. > */ > if (unlikely(!msr_write_intercepted(vmx, MSR_IA32_SPEC_CTRL))) > vmx->spec_ctrl = native_read_msr(MSR_IA32_SPEC_CTRL); > -- > 2.30.1 (Apple Git-130) >