On Thu, Oct 6, 2022 at 5:35 PM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > On Thu, Oct 06, 2022, Jim Mattson wrote: > > On Wed, Sep 28, 2022 at 4:53 PM Jim Mattson <jmattson@xxxxxxxxxx> wrote: > > > > > > According to Intel's document on Indirect Branch Restricted > > > Speculation, "Enabling IBRS does not prevent software from controlling > > > the predicted targets of indirect branches of unrelated software > > > executed later at the same predictor mode (for example, between two > > > different user applications, or two different virtual machines). Such > > > isolation can be ensured through use of the Indirect Branch Predictor > > > Barrier (IBPB) command." This applies to both basic and enhanced IBRS. > > > > > > Since L1 and L2 VMs share hardware predictor modes (guest-user and > > > guest-kernel), hardware IBRS is not sufficient to virtualize > > > IBRS. (The way that basic IBRS is implemented on pre-eIBRS parts, > > > hardware IBRS is actually sufficient in practice, even though it isn't > > > sufficient architecturally.) > > > > > > For virtual CPUs that support IBRS, add an indirect branch prediction > > > barrier on emulated VM-exit, to ensure that the predicted targets of > > > indirect branches executed in L1 cannot be controlled by software that > > > was executed in L2. > > > > > > Since we typically don't intercept guest writes to IA32_SPEC_CTRL, > > > perform the IBPB at emulated VM-exit regardless of the current > > > IA32_SPEC_CTRL.IBRS value, even though the IBPB could technically be > > > deferred until L1 sets IA32_SPEC_CTRL.IBRS, if IA32_SPEC_CTRL.IBRS is > > > clear at emulated VM-exit. > > > > > > This is CVE-2022-2196. > > > > > > Fixes: 5c911beff20a ("KVM: nVMX: Skip IBPB when switching between vmcs01 and vmcs02") > > > Cc: Sean Christopherson <seanjc@xxxxxxxxxx> > > > Signed-off-by: Jim Mattson <jmattson@xxxxxxxxxx> > > Code looks good, just some whining about the comments. > > > > --- > > > arch/x86/kvm/vmx/nested.c | 8 ++++++++ > > > arch/x86/kvm/vmx/vmx.c | 8 +++++--- > > > 2 files changed, 13 insertions(+), 3 deletions(-) > > > > > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > > > index ddd4367d4826..87993951fe47 100644 > > > --- a/arch/x86/kvm/vmx/nested.c > > > +++ b/arch/x86/kvm/vmx/nested.c > > > @@ -4604,6 +4604,14 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 vm_exit_reason, > > > > > > vmx_switch_vmcs(vcpu, &vmx->vmcs01); > > > > > > + /* > > > + * For virtual IBRS, > > The "virtual" part confused me (even more so than usual for all this mitigation > stuff). At first I thought "virtual" was referring to some specialized hardware. > > > we have to flush the indirect branch > > Avoid pronouns please. > > > > + * predictors, since L1 and L2 share hardware predictor > > > + * modes. > > Wrap at 80 chars, this easily fits on two lines. Though I'd prefer to make this > comment much longer, I have a terrible time keeping track of this stuff (obviously). > > Is this accurate? > > /* > * If IBRS is advertised to the guest, KVM must flush the indirect Advertising is done on a per-vCPU basis, so "advertised to the vCPU"? > * branch predictors when transitioning from L2 to L1, as L1 expects > * hardware (KVM in this case) to provide separate predictor modes. > * Bare metal isolates VMX root (host) from VMX non-root (guest), but > * doesn't isolate different VMs, i.e. in this case, doesn't provide "different VMCSs"? > * separate modes for L2 vs L1. > */ > > > > + */ > > > + if (guest_cpuid_has(vcpu, X86_FEATURE_SPEC_CTRL)) > > > + indirect_branch_prediction_barrier(); > > > + > > > /* Update any VMCS fields that might have changed while L2 ran */ > > > vmcs_write32(VM_EXIT_MSR_LOAD_COUNT, vmx->msr_autoload.host.nr); > > > vmcs_write32(VM_ENTRY_MSR_LOAD_COUNT, vmx->msr_autoload.guest.nr); > > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > > > index ffe552a82044..bf8b5c9c56ae 100644 > > > --- a/arch/x86/kvm/vmx/vmx.c > > > +++ b/arch/x86/kvm/vmx/vmx.c > > > @@ -1347,9 +1347,11 @@ void vmx_vcpu_load_vmcs(struct kvm_vcpu *vcpu, int cpu, > > > vmcs_load(vmx->loaded_vmcs->vmcs); > > > > > > /* > > > - * No indirect branch prediction barrier needed when switching > > > - * the active VMCS within a guest, e.g. on nested VM-Enter. > > > - * The L1 VMM can protect itself with retpolines, IBPB or IBRS. > > > + * No indirect branch prediction barrier needed when > > > + * switching the active VMCS within a guest, except > > Wrap at 80. > > > > + * for virtual IBRS. To minimize the number of IBPBs > > Same nit on "virtual IBRS". > > > > + * executed, the one to support virtual IBRS is > > > + * handled specially in nested_vmx_vmexit(). > > "nested VM-Exit" instead of nested_vmx_vmexit() to avoid rename hell, though that > one is unlikely to get renamed. > > This work? > > /* > * No indirect branch prediction barrier needed when switching > * the active VMCS within a guest, except if IBRS is advertised "within a vCPU"? > * to the guest. To minimize the number of IBPBs executed, KVM > * performs IBPB on nested VM-Exit (a single nested transition > * may switch the active VMCS multiple times). > */ > Should I send v2, or can this all be handled when applying?