On Thu, Jun 25, 2020 at 10:09:13AM +0200, Paolo Bonzini wrote: > On 25/06/20 08:15, Sean Christopherson wrote: > > IMO, kvm_cpuid() is simply buggy. If KVM attempts to access a non-existent > > MSR then it darn well should warn. > > > > diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c > > index 8a294f9747aa..7ef7283011d6 100644 > > --- a/arch/x86/kvm/cpuid.c > > +++ b/arch/x86/kvm/cpuid.c > > @@ -1013,7 +1013,8 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, > > *ebx = entry->ebx; > > *ecx = entry->ecx; > > *edx = entry->edx; > > - if (function == 7 && index == 0) { > > + if (function == 7 && index == 0 && (*ebx | (F(RTM) | F(HLE))) && > > + (vcpu->arch.arch_capabilities & ARCH_CAP_TSX_CTRL_MSR)) { > > u64 data; > > if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) && > > (data & TSX_CTRL_CPUID_CLEAR)) > > > > That works too, but I disagree that warning is the correct behavior > here. It certainly should warn as long as kvm_get_msr blindly returns > zero. However, for a guest it's fine to access a potentially > non-existent MSR if you're ready to trap the #GP, and the point of this > series is to let cpuid.c or any other KVM code do the same. I get the "what" of the change, and even the "why" to some extent, but I dislike the idea of supporting/encouraging blind reads/writes to MSRs. Blind writes are just asking for problems, and suppressing warnings on reads is almost guaranteed to be suppressing a KVM bug. Case in point, looking at the TSX thing again, I actually think the fix should be: diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c index 5eb618dbf211..64322446e590 100644 --- a/arch/x86/kvm/cpuid.c +++ b/arch/x86/kvm/cpuid.c @@ -1013,9 +1013,9 @@ bool kvm_cpuid(struct kvm_vcpu *vcpu, u32 *eax, u32 *ebx, *ebx = entry->ebx; *ecx = entry->ecx; *edx = entry->edx; - if (function == 7 && index == 0) { + if (function == 7 && index == 0 && (*ebx | (F(RTM) | F(HLE))) { u64 data; - if (!__kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data, true) && + if (!kvm_get_msr(vcpu, MSR_IA32_TSX_CTRL, &data) && (data & TSX_CTRL_CPUID_CLEAR)) *ebx &= ~(F(RTM) | F(HLE)); } On VMX, MSR_IA32_TSX_CTRL will be added to the so called shared MSR array regardless of whether or not it is being advertised to userspace (this is a bug in its own right). Using the host_initiated variant means KVM will incorrectly bypass VMX's ARCH_CAP_TSX_CTRL_MSR check, i.e. incorrectly clear the bits if userspace is being weird and stuffed MSR_IA32_TSX_CTRL without advertising it to the guest. In short, the whole MSR_IA32_TSX_CTRL implementation seems messy and this is just papering over that mess. The correct fix is to invoke setup_msrs() on writes to MSR_IA32_ARCH_CAPABILITIES, filtering MSR_IA32_TSX_CTRL out of shared MSRs when it's not advertised, and change kvm_cpuid() to use the unpriveleged variant. TSC_CTRL aside, if we insist on pointing a gun at our foot at some point, this should be a dedicated flavor of MSR access, e.g. msr_data.kvm_initiated, so that it at least requires intentionally loading the gun.