On Fri, Sep 22, 2023, Jim Mattson wrote: > On Fri, Sep 22, 2023 at 10:21 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > IMO, we should delete the offending kernel code. I don't see how it provides any > > value these days. > > Sure, but that doesn't help legacy guests. Heh, IMO they don't need help, their owners just need to be placated ;-) > > And *if* we want to change something in KVM so that we stop getting coustomer > > complaints about a useless bit, just let userspace stuff the bit. > > We want to make customers happy. That should not even be a question. Can we really not tell them "this is a benign guest bug, ignore it"? > > I think we should also raise the issue with AMD (Borislav maybe?) and ask/demand > > that bits in HWCR that KVM allows to be set are architecturally defined. It's > > totally fine if the value of bit 24 is uarch specific, but the behavior needs to > > be something that won't change from processor to processor. > > > > > kvm_pmu_refresh(vcpu); > > > vcpu->arch.cr4_guest_rsvd_bits = > > > __cr4_reserved_bits(guest_cpuid_has, vcpu); > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > > index 3421ed7fcee0..cb02a7c2938b 100644 > > > --- a/arch/x86/kvm/x86.c > > > +++ b/arch/x86/kvm/x86.c > > > @@ -3699,12 +3699,19 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info) > > > data &= ~(u64)0x40; /* ignore flush filter disable */ > > > data &= ~(u64)0x100; /* ignore ignne emulation enable */ > > > data &= ~(u64)0x8; /* ignore TLB cache disable */ > > > + data &= ~(u64)0x1000000;/* ignore TscFreqSel */ > > > > > > /* Handle McStatusWrEn */ > > > if (data & ~BIT_ULL(18)) { > > > kvm_pr_unimpl_wrmsr(vcpu, msr, data); > > > return 1; > > > } > > > + > > > + /* > > > + * When set, TscFreqSel is read-only. Attempts to > > > + * clear it are ignored. > > > + */ > > > + data |= vcpu->arch.msr_hwcr & BIT_ULL(24); > > > > > > The bit is read-only from the guest, but KVM needs to let userspace clear the > > bit. > > Why? We don't let userspace clear bit 1 of EFLAGS, which is also a > "reads as one" bit. Because that's architectural behavior, not dependent on FMS, and KVM needs to write EFLAGS to have any hope of being useful, i.e. giving ownership of EFLAGS to userspace is not a realistic option. As proposed, if userspace sets CPUID to a magic FMS, and then changes the FMS to something else, kvm_vcpu_after_set_cpuid() will not clear the bit and KVM will end up wrongly enumerating the bit. I doubt userspace would ever do that, but it's at least possible. That could be fixed by actively clearing vcpu->arch.msr_hwcr for other FMS values, but then KVM would have to be 100% precise on the FMS matching, which would be a maintenance nightmare. In other words, userspace owns the vCPU model, and for good reasons. KVM needs to allow userspace to define a sane model, but with *very* few exceptions, KVM should not try to "help" userspace by stuffing bits. Pretty much everytime KVM tries to help, it causes problems. E.g. initializing perf_capabilities to kvm_caps.supported_perf_cap seems like a good thing, except it presents a bogus model if userspace decides to not enumerate a vPMU to the guest (Aaron was allegedly going to send a patch for this...).