Re: [PATCH] KVM: x86/vPMU: ignore the check of IA32_PERF_GLOBAL_CTRL bit35

"Gao,Shiyuan" <gaoshiyuan@xxxxxxxxx> · Wed, 14 Jun 2023 12:36:20 +0000

On 2023/6/14 at 5:00 AM，Sean Christopherson <seanjc@xxxxxxxxxx <mailto:seanjc@xxxxxxxxxx> wrote:

> On Mon, Jun 05, 2023, Gao,Shiyuan wrote:
>
> > On Fri, Jun 3, 2023, Jim Mattson wrote:
> >
> > > On Fri, Jun 2, 2023 at 3:52 PM Sean Christopherson <seanjc@xxxxxxxxxx <mailto:seanjc@xxxxxxxxxx> <mailto:seanjc@xxxxxxxxxx <mailto:seanjc@xxxxxxxxxx>>> wrote:
> > > >
> > > > On Fri, Jun 02, 2023, Jim Mattson wrote:
> > > > > On Fri, Jun 2, 2023 at 2:48 PM Sean Christopherson <seanjc@xxxxxxxxxx <mailto:seanjc@xxxxxxxxxx> <mailto:seanjc@xxxxxxxxxx <mailto:seanjc@xxxxxxxxxx>>> wrote:
> > > > > >
> > > > > > On Fri, Jun 02, 2023, Jim Mattson wrote:
> > > > > Um, yeah. Userspace can clear bit 35 from the saved
> > > > > IA32_PERF_GLOBAL_CTRL MSR so that the migration will complete. But
> > > > > what happens the next time the guest tries to set bit 35 in
> > > > > IA32_PERF_GLOBAL_CTRL, which it will probably do, since it cached
> > > > > CPUID.0AH at boot?
> > > >
> > > > Ah, right. Yeah, guest is hosed.
> > > >
> > > > I'm still not convinced this is KVM's problem to fix.
> > >
> > > One could argue that userspace should have known better than to
> > > believe KVM_GET_SUPPORTED_CPUID in the first place. Or that it should
> > > have known better than to blindly pass that through to KVM_SET_CPUID2.
> > > I mean, *obviously* KVM didn't really support TOPDOWN.SLOTS. Right?
> > >
> > >
> > > But if userspace can't trust KVM_GET_SUPPORTED_CPUID to tell it about
> > > which fixed counters are supported, how is it supposed to find out?
> > >
> > >
> > > Another way of solving this, which should make everyone happy, is to
> > > add KVM support for TOPDOWN.SLOTS.
> > >
> > Yeah, this way may make everyone happly, but we need guarantee the VM that
> > not support TOPDOWN.SLOTS migrate success. I think this also need be addressed
> > with a quirk like this submmit.
> >
> > I can't find an elegant solution...
>
>
> I can't think of an elegant solution either. That said, I still don't think we
> should add a quirk to upstream KVM. This is not a longstanding KVM goof that
> userspace has come to rely on, it's a combination of bugs in KVM, QEMU, and the
> deployment (for presumably not validating before pushing to production). And the
> issue affects a only relatively new CPUs. Silently suppressing a known bad config
> also makes me uncomfortable, even though it's unlikely that any deplyoment would
> rather terminate VMs than run with a messed up vPMU.
>
>
> I'm not dead set against a quirk, but unless the issue affects a broad set of
> users, I would prefer to not carry anything in upstream, and instead have (the
> (hopefully small set of) users carry an out-of-tree hack-a-fix until all their
> affected VMs are rebooted on a fixed KVM and/or QEMU.
>

As long as limit the maximum number of vPMU fixed counters to 3 in kvm, I think the
check of IA32_PERF_GLOBAL_CTRL bit35-63 is unnecessary.

Maybe define a macro such as IA32_PERF_GLOBAL_CTRL_RESERVED under MAX_FIXED_COUNTERS,
and ignore the check from IA32_PERF_GLOBAL_CTRL_RESERVED to bit63.

 #define MAX_FIXED_COUNTERS     3
+#define IA32_PERF_GLOBAL_CTRL_RESERVED 35

 static inline bool kvm_valid_perf_global_ctrl(struct kvm_pmu *pmu,
                                                 u64 data)
 {
-       return !(pmu->global_ctrl_mask & data);
+       return !(pmu->global_ctrl_mask & (data & (1ULL << IA32_PERF_GLOBAL_CTRL_RESERVED) - 1));
 }