On Wed, Jan 08, 2025, Borislav Petkov wrote: > > And do you know what 0xd23f corresponds to? > > How's that: > > $ objdump -D arch/x86/kvm/kvm.ko > ... > 000000000000d1a0 <kvm_vcpu_halt>: > d1a0: e8 00 00 00 00 call d1a5 <kvm_vcpu_halt+0x5> > d1a5: 55 push %rbp > ... > > d232: e8 09 93 ff ff call 6540 <kvm_vcpu_check_block> > d237: 85 c0 test %eax,%eax > d239: 0f 88 f6 01 00 00 js d435 <kvm_vcpu_halt+0x295> > d23f: f3 90 pause > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > d241: e8 00 00 00 00 call d246 <kvm_vcpu_halt+0xa6> > d246: 48 89 c3 mov %rax,%rbx > d249: e8 00 00 00 00 call d24e <kvm_vcpu_halt+0xae> > d24e: 84 c0 test %al,%al > > > Which makes sense :-) Ooh, it's just the MSR writes that increased. I misinterpreted the profile statement and thought that something in KVM was jumping from ~0% to 4.31%. If the cost really is just this: 1.66% qemu-system-x86 [kernel.kallsyms] [k] native_write_msr 1.50% qemu-system-x86 [kernel.kallsyms] [k] native_write_msr_safe vs 1.01% qemu-system-x86 [kernel.kallsyms] [k] native_write_msr 0.81% qemu-system-x86 [kernel.kallsyms] [k] native_write_msr_safe then my vote is to go with the user_return approach. It's unfortunate that restoring full speculation may be delayed until a CPU exits to userspace or KVM is unloaded, but given that enable_virt_at_load is enabled by default, in practice it's likely still far better than effectively always running the host with reduced speculation. > > Yeah, especially if this is all an improvement over the existing mitigation. > > Though since it can impact non-virtualization workloads, maybe it should be a > > separately selectable mitigation? I.e. not piggybacked on top of ibpb-vmexit? > > Well, ibpb-on-vmexit is your typical cloud provider scenario where you address > the VM/VM attack vector by doing an IBPB on VMEXIT. No? svm_vcpu_load() emits IBPB when switching VMCBs, i.e. when switching between vCPUs that may live in separate security contexts. That IBPB is skipped when X86_FEATURE_IBPB_ON_VMEXIT is enabled, because the host is trusted to not attack its guests. > This SRSO_MSR_FIX thing protects the *host* from a malicious guest so you > need both enabled for full protection on the guest/host vector. If reducing speculation protects the host, why wouldn't that also protect other guests? The CPU needs to bounce through the host before enterring a different guest. And if for some reason reducing speculation doesn't suffice, wouldn't it be better to fall back to doing IBPB only when switching VMCBs?