Re: [PATCH 2/3] KVM: x86: Virtualize HWCR.TscFreqSel[bit 24]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 22, 2023, Jim Mattson wrote:
> On Fri, Sep 22, 2023 at 10:21 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote:
> > IMO, we should delete the offending kernel code.  I don't see how it provides any
> > value these days.
> 
> Sure, but that doesn't help legacy guests.

Heh, IMO they don't need help, their owners just need to be placated ;-)

> > And *if* we want to change something in KVM so that we stop getting coustomer
> > complaints about a useless bit, just let userspace stuff the bit.
> 
> We want to make customers happy. That should not even be a question.

Can we really not tell them "this is a benign guest bug, ignore it"?

> > I think we should also raise the issue with AMD (Borislav maybe?) and ask/demand
> > that bits in HWCR that KVM allows to be set are architecturally defined.  It's
> > totally fine if the value of bit 24 is uarch specific, but the behavior needs to
> > be something that won't change from processor to processor.
> >
> > >       kvm_pmu_refresh(vcpu);
> > >       vcpu->arch.cr4_guest_rsvd_bits =
> > >           __cr4_reserved_bits(guest_cpuid_has, vcpu);
> > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > > index 3421ed7fcee0..cb02a7c2938b 100644
> > > --- a/arch/x86/kvm/x86.c
> > > +++ b/arch/x86/kvm/x86.c
> > > @@ -3699,12 +3699,19 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct msr_data *msr_info)
> > >               data &= ~(u64)0x40;     /* ignore flush filter disable */
> > >               data &= ~(u64)0x100;    /* ignore ignne emulation enable */
> > >               data &= ~(u64)0x8;      /* ignore TLB cache disable */
> > > +             data &= ~(u64)0x1000000;/* ignore TscFreqSel */
> > >
> > >               /* Handle McStatusWrEn */
> > >               if (data & ~BIT_ULL(18)) {
> > >                       kvm_pr_unimpl_wrmsr(vcpu, msr, data);
> > >                       return 1;
> > >               }
> > > +
> > > +             /*
> > > +              * When set, TscFreqSel is read-only. Attempts to
> > > +              * clear it are ignored.
> > > +              */
> > > +             data |= vcpu->arch.msr_hwcr & BIT_ULL(24);
> >
> >
> > The bit is read-only from the guest, but KVM needs to let userspace clear the
> > bit.
> 
> Why? We don't let userspace clear bit 1 of EFLAGS, which is also a
> "reads as one" bit.

Because that's architectural behavior, not dependent on FMS, and KVM needs to
write EFLAGS to have any hope of being useful, i.e. giving ownership of EFLAGS
to userspace is not a realistic option.

As proposed, if userspace sets CPUID to a magic FMS, and then changes the FMS to
something else, kvm_vcpu_after_set_cpuid() will not clear the bit and KVM will
end up wrongly enumerating the bit.  I doubt userspace would ever do that, but
it's at least possible.

That could be fixed by actively clearing vcpu->arch.msr_hwcr for other FMS values,
but then KVM would have to be 100% precise on the FMS matching, which would be a
maintenance nightmare.

In other words, userspace owns the vCPU model, and for good reasons.  KVM needs
to allow userspace to define a sane model, but with *very* few exceptions, KVM
should not try to "help" userspace by stuffing bits.

Pretty much everytime KVM tries to help, it causes problems.  E.g. initializing
perf_capabilities to kvm_caps.supported_perf_cap seems like a good thing, except
it presents a bogus model if userspace decides to not enumerate a vPMU to the
guest (Aaron was allegedly going to send a patch for this...).




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux