On Wed, Oct 25, 2023, Pawan Gupta wrote: > During VMentry VERW is executed to mitigate MDS. After VERW, any memory > access like register push onto stack may put host data in MDS affected > CPU buffers. A guest can then use MDS to sample host data. > > Although likelihood of secrets surviving in registers at current VERW > callsite is less, but it can't be ruled out. Harden the MDS mitigation > by moving the VERW mitigation late in VMentry path. > > Note that VERW for MMIO Stale Data mitigation is unchanged because of > the complexity of per-guest conditional VERW which is not easy to handle > that late in asm with no GPRs available. If the CPU is also affected by > MDS, VERW is unconditionally executed late in asm regardless of guest > having MMIO access. > > Signed-off-by: Pawan Gupta <pawan.kumar.gupta@xxxxxxxxxxxxxxx> > --- > arch/x86/kvm/vmx/vmenter.S | 3 +++ > arch/x86/kvm/vmx/vmx.c | 10 +++++++--- > 2 files changed, 10 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/kvm/vmx/vmenter.S b/arch/x86/kvm/vmx/vmenter.S > index b3b13ec04bac..139960deb736 100644 > --- a/arch/x86/kvm/vmx/vmenter.S > +++ b/arch/x86/kvm/vmx/vmenter.S > @@ -161,6 +161,9 @@ SYM_FUNC_START(__vmx_vcpu_run) > /* Load guest RAX. This kills the @regs pointer! */ > mov VCPU_RAX(%_ASM_AX), %_ASM_AX > > + /* Clobbers EFLAGS.ZF */ > + CLEAR_CPU_BUFFERS > + > /* Check EFLAGS.CF from the VMX_RUN_VMRESUME bit test above. */ > jnc .Lvmlaunch > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 24e8694b83fc..2d149589cf5b 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -7226,13 +7226,17 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, > > guest_state_enter_irqoff(); > > - /* L1D Flush includes CPU buffer clear to mitigate MDS */ > + /* > + * L1D Flush includes CPU buffer clear to mitigate MDS, but VERW > + * mitigation for MDS is done late in VMentry and is still > + * executed inspite of L1D Flush. This is because an extra VERW in spite > + * should not matter much after the big hammer L1D Flush. > + */ > if (static_branch_unlikely(&vmx_l1d_should_flush)) > vmx_l1d_flush(vcpu); There's an existing bug here. vmx_1ld_flush() is not guaranteed to do a flush in "conditional mode", and is not guaranteed to do a ucode-based flush (though I can't tell if it's possible for the VERW magic to exist without X86_FEATURE_FLUSH_L1D). If we care, something like the diff at the bottom is probably needed. > - else if (cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF)) > - mds_clear_cpu_buffers(); > else if (static_branch_unlikely(&mmio_stale_data_clear) && > kvm_arch_has_assigned_device(vcpu->kvm)) > + /* MMIO mitigation is mutually exclusive with MDS mitigation later in asm */ Please don't put comments inside an if/elif without curly braces (and I don't want to add curly braces). Though I think that's a moot point if we first fix the conditional L1D flush issue. E.g. when the dust settles we can end up with: /* * Note, a ucode-based L1D flush also flushes CPU buffers, i.e. the * manual VERW in __vmx_vcpu_run() to mitigate MDS *may* be redundant. * But an L1D Flush is not guaranteed for "conditional mode", and the * cost of an extra VERW after a full L1D flush is negligible. */ if (static_branch_unlikely(&vmx_l1d_should_flush)) cpu_buffers_flushed = vmx_l1d_flush(vcpu); /* * The MMIO stale data vulnerability is a subset of the general MDS * vulnerability, i.e. this is mutually exclusive with the VERW that's * done just before VM-Enter. The vulnerability requires the attacker, * i.e. the guest, to do MMIO, so this "clear" can be done earlier. */ if (static_branch_unlikely(&mmio_stale_data_clear) && !cpu_buffers_flushed && kvm_arch_has_assigned_device(vcpu->kvm)) mds_clear_cpu_buffers(); > mds_clear_cpu_buffers(); > > vmx_disable_fb_clear(vmx); LOL, nice. IIUC, setting FB_CLEAR_DIS is mutually exclusive with doing a late VERW, as KVM will never set FB_CLEAR_DIS if the CPU is susceptible to X86_BUG_MDS. But the checks aren't identical, which makes this _look_ sketchy. Can you do something like this to ensure we don't accidentally neuter the late VERW? static void vmx_update_fb_clear_dis(struct kvm_vcpu *vcpu, struct vcpu_vmx *vmx) { vmx->disable_fb_clear = (host_arch_capabilities & ARCH_CAP_FB_CLEAR_CTRL) && !boot_cpu_has_bug(X86_BUG_MDS) && !boot_cpu_has_bug(X86_BUG_TAA); if (vmx->disable_fb_clear && WARN_ON_ONCE(cpu_feature_enabled(X86_FEATURE_CLEAR_CPU_BUF))) vmx->disable_fb_clear = false; ... } -- diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c index 6e502ba93141..cf6e06bb8310 100644 --- a/arch/x86/kvm/vmx/vmx.c +++ b/arch/x86/kvm/vmx/vmx.c @@ -6606,8 +6606,11 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu, fastpath_t exit_fastpath) * is not exactly LRU. This could be sized at runtime via topology * information but as all relevant affected CPUs have 32KiB L1D cache size * there is no point in doing so. + * + * Returns %true if CPU buffers were cleared, i.e. if a microcode-based L1D + * flush was executed (which also clears CPU buffers). */ -static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu) +static noinstr bool vmx_l1d_flush(struct kvm_vcpu *vcpu) { int size = PAGE_SIZE << L1D_CACHE_ORDER; @@ -6634,14 +6637,14 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu) kvm_clear_cpu_l1tf_flush_l1d(); if (!flush_l1d) - return; + return false; } vcpu->stat.l1d_flush++; if (static_cpu_has(X86_FEATURE_FLUSH_L1D)) { native_wrmsrl(MSR_IA32_FLUSH_CMD, L1D_FLUSH); - return; + return true; } asm volatile( @@ -6665,6 +6668,8 @@ static noinstr void vmx_l1d_flush(struct kvm_vcpu *vcpu) :: [flush_pages] "r" (vmx_l1d_flush_pages), [size] "r" (size) : "eax", "ebx", "ecx", "edx"); + + return false; } static void vmx_update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr) @@ -7222,16 +7227,17 @@ static noinstr void vmx_vcpu_enter_exit(struct kvm_vcpu *vcpu, unsigned int flags) { struct vcpu_vmx *vmx = to_vmx(vcpu); + bool cpu_buffers_flushed = false; guest_state_enter_irqoff(); - /* L1D Flush includes CPU buffer clear to mitigate MDS */ if (static_branch_unlikely(&vmx_l1d_should_flush)) - vmx_l1d_flush(vcpu); - else if (static_branch_unlikely(&mds_user_clear)) - mds_clear_cpu_buffers(); - else if (static_branch_unlikely(&mmio_stale_data_clear) && - kvm_arch_has_assigned_device(vcpu->kvm)) + cpu_buffers_flushed = vmx_l1d_flush(vcpu); + + if ((static_branch_unlikely(&mds_user_clear) || + (static_branch_unlikely(&mmio_stale_data_clear) && + kvm_arch_has_assigned_device(vcpu->kvm))) && + !cpu_buffers_flushed) mds_clear_cpu_buffers(); vmx_disable_fb_clear(vmx);