Re: [PATCH v2 3/4] x86/bugs: KVM: Add support for SRSO_MSR_FIX

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 8 Jan 2025 09:18:17 -0800

On Wed, Jan 08, 2025, Borislav Petkov wrote:
> > And do you know what 0xd23f corresponds to?
> 
> How's that:
> 
> $ objdump -D arch/x86/kvm/kvm.ko
> ...
> 000000000000d1a0 <kvm_vcpu_halt>:
>     d1a0:       e8 00 00 00 00          call   d1a5 <kvm_vcpu_halt+0x5>
>     d1a5:       55                      push   %rbp
>     ...
> 
>     d232:       e8 09 93 ff ff          call   6540 <kvm_vcpu_check_block>
>     d237:       85 c0                   test   %eax,%eax
>     d239:       0f 88 f6 01 00 00       js     d435 <kvm_vcpu_halt+0x295>
>     d23f:       f3 90                   pause
>     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> 
>     d241:       e8 00 00 00 00          call   d246 <kvm_vcpu_halt+0xa6>
>     d246:       48 89 c3                mov    %rax,%rbx
>     d249:       e8 00 00 00 00          call   d24e <kvm_vcpu_halt+0xae>
>     d24e:       84 c0                   test   %al,%al
> 
> 
> Which makes sense :-)

Ooh, it's just the MSR writes that increased.  I misinterpreted the profile
statement and thought that something in KVM was jumping from ~0% to 4.31%.  If
the cost really is just this:

   1.66%  qemu-system-x86  [kernel.kallsyms]        [k] native_write_msr
   1.50%  qemu-system-x86  [kernel.kallsyms]        [k] native_write_msr_safe

vs

   1.01%  qemu-system-x86  [kernel.kallsyms]        [k] native_write_msr
   0.81%  qemu-system-x86  [kernel.kallsyms]        [k] native_write_msr_safe

then my vote is to go with the user_return approach.  It's unfortunate that
restoring full speculation may be delayed until a CPU exits to userspace or KVM
is unloaded, but given that enable_virt_at_load is enabled by default, in practice
it's likely still far better than effectively always running the host with reduced
speculation.

> > Yeah, especially if this is all an improvement over the existing mitigation.
> > Though since it can impact non-virtualization workloads, maybe it should be a
> > separately selectable mitigation?  I.e. not piggybacked on top of ibpb-vmexit?
> 
> Well, ibpb-on-vmexit is your typical cloud provider scenario where you address
> the VM/VM attack vector by doing an IBPB on VMEXIT. 

No?  svm_vcpu_load() emits IBPB when switching VMCBs, i.e. when switching between
vCPUs that may live in separate security contexts.  That IBPB is skipped when
X86_FEATURE_IBPB_ON_VMEXIT is enabled, because the host is trusted to not attack
its guests.

> This SRSO_MSR_FIX thing protects the *host* from a malicious guest so you
> need both enabled for full protection on the guest/host vector.

If reducing speculation protects the host, why wouldn't that also protect other
guests?  The CPU needs to bounce through the host before enterring a different
guest.

And if for some reason reducing speculation doesn't suffice, wouldn't it be
better to fall back to doing IBPB only when switching VMCBs?