On Mon, Dec 30, 2024, Borislav Petkov wrote: > On Mon, Dec 16, 2024 at 10:51:13AM -0800, Sean Christopherson wrote: > Note the WARN_ON_ONCE bracketing. But I know you're doing this on purpose - to > see if I'm paying attention and not taking your patch blindly :-P LOL, yeah, totally on purpose. > With that fixed, this approach still doesn't look sane to me: before I start > the guest I have all SPEC_REDUCE bits correctly clear: > > # rdmsr -a 0xc001102e | uniq -c > 128 420000 > > ... start a guest, shut it down cleanly, qemu exits properly... > > # rdmsr -a 0xc001102e | uniq -c ... > so SPEC_REDUCE remains set on some cores. Not good since I'm not running VMs > anymore. > > # rmmod kvm_amd kvm > # rdmsr -a 0xc001102e | uniq -c > 128 420000 > > that looks more like it. The "host" value will only be restored when the CPU exits to userspace, so if there are no userspace tasks running on those CPUs, i.e. nothing that forces them back to userspace, then it's expected for them to have the "guest" value loaded, even after the guest is long gone. Unloading KVM effectively forces KVM to simulate a return to userspace and thus restore the host values. It seems unlikely that someone would care deeply about the performance of a CPU that is only running kernel code, but I agree it's odd and not exactly desirable. > Also, this user-return MSR toggling does show up higher in the profile: > > 4.31% qemu-system-x86 [kvm] [k] 0x000000000000d23f > 2.44% qemu-system-x86 [kernel.kallsyms] [k] read_tsc > 1.66% qemu-system-x86 [kernel.kallsyms] [k] native_write_msr > 1.50% qemu-system-x86 [kernel.kallsyms] [k] native_write_msr_safe > > vs > > 1.01% qemu-system-x86 [kernel.kallsyms] [k] native_write_msr > 0.81% qemu-system-x86 [kernel.kallsyms] [k] native_write_msr_safe > > so it really is noticeable. Hmm, mostly out of curiosity, what's the "workload"? And do you know what 0xd23f corresponds to? For most setups, exits all the way to userspace are relatively uncommon. There are scenarios where the number of userspace exits is quite high, e.g. if the guest is spamming its emulated serial console, but I wouldn't expect switching the MSR on user entry/exit to be that noticeable. > So I wanna say, let's do the below and be done with it. My expectation is that > this won't be needed in the future anymore either so it'll be a noop on most > machines... Yeah, especially if this is all an improvement over the existing mitigation. Though since it can impact non-virtualization workloads, maybe it should be a separately selectable mitigation? I.e. not piggybacked on top of ibpb-vmexit?