Re: [PATCH v2] KVM: x86/pmu: Disable all vPMU features support on Intel hybrid CPUs

Sean Christopherson <seanjc@xxxxxxxxxx> · Fri, 3 Feb 2023 17:28:13 +0000

On Fri, Feb 03, 2023, Like Xu wrote:
> On 3/2/2023 2:06 am, Sean Christopherson wrote:
> > On Thu, Feb 02, 2023, Like Xu wrote:
> > > On 1/2/2023 12:02 am, Sean Christopherson wrote:
> > > The perf interface only provides host PMU capabilities and the logic for
> > > choosing to disable (or enable) vPMU based on perf input should be left
> > > in the KVM part so that subsequent development work can add most code
> > > to the just KVM, which is very helpful for downstream users to upgrade
> > > loadable KVM module rather than the entire core kernel.
> > > 
> > > My experience interacting with the perf subsystem has taught me that
> > > perf change required from KVM should be made as small as possible.
> > 
> > I don't disagree, but I don't think that's relevant in this case.  Perf doesn't
> > provide the necessary bits for KVM to virtualize a hybrid PMU, so unless KVM is
> > somehow able to get away with enumerating a very stripped down vPMU, additional
> > modifications to perf_get_x86_pmu_capability() will be required.
> > 
> > What I care more about though is this ugliness in perf_get_x86_pmu_capability():
> > 
> > 	/*
> > 	 * KVM doesn't support the hybrid PMU yet.
> > 	 * Return the common value in global x86_pmu,
> > 	 * which available for all cores.
> 
> I would have expected w/ current code base, vpmu (excluding pebs and lbr, intel_pt)
> to continue to work on any type of pCPU until you decide to disable them completely.

Didn't follow this.

> Moreover, the caller of perf_get_x86_pmu_capability() may be more than just KVM,
> it may be technically ebpf helpers. The diff on comments from v1 can be applied to
> this version (restrict KVM semantics), and it makes the status quo clearer
> to KVM users.

In that case, eBPF is just as hosed, no?  And given that the only people that have
touched perf_get_x86_pmu_capability() in its 11+ years of existence are all KVM
people, I have a hard time believing there is meaningful use outside of KVM.

> > 	 */
> > 	cap->num_counters_gp	= x86_pmu.num_counters;
> > 
> > I really don't want to leave that comment lying around as it's flat out wrong in
> > that it obviously doesn't address the other differences beyond the number of
> > counters.  And since there are dependencies on perf, my preference is to disable
> > PMU enumeration in perf specifically so that whoever takes on vPMU enabling is
> > forced to consider the perf side of things, and get buy in from the perf folks.
> 
> The perf_get_x86_pmu_capability() obviously needs to be revamped,
> but until real effective KVM enabling work arrives, any inconsequential intrusion
> into perf/core code will only lead to trivial system maintenance.

Trivial doesn't mean useless or unnecessary though.  IMO, there's value in capturing,
in code, that perf_get_x86_pmu_capability() doesn't properly support hybrid vPMUs.

That said, poking around perf, checking is_hybrid() is wrong.  This quirk suggests
that if E-cores are disabled via BIOS, (a) X86_FEATURE_HYBRID_CPU is _supposed_ to
be cleared, and (b) the base PMU will reflect the P-core PMU.  I.e. someone can
enable vPMU by disabling E-cores.

                /*
                 * Quirk: For some Alder Lake machine, when all E-cores are disabled in
                 * a BIOS, the leaf 0xA will enumerate all counters of P-cores. However,
                 * the X86_FEATURE_HYBRID_CPU is still set. The above codes will
                 * mistakenly add extra counters for P-cores. Correct the number of
                 * counters here.
                 */
                if ((pmu->num_counters > 8) || (pmu->num_counters_fixed > 4)) {
                        pmu->num_counters = x86_pmu.num_counters;
                        pmu->num_counters_fixed = x86_pmu.num_counters_fixed;
                }

Side topic, someone (*cough* Intel) should fix that, e.g. detect the scenario
during boot and manually clear X86_FEATURE_HYBRID_CPU.

I'm also ok explicitly disabling support in KVM, but since we need to update
perf as well (that KVM comment needs to go), I don't see any reason not to also
update perf_get_x86_pmu_capability().

How about this?  Maybe split over two patches to separate the KVM and perf changes?

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 85a63a41c471..d096b04bf80e 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2974,17 +2974,19 @@ unsigned long perf_misc_flags(struct pt_regs *regs)
 
 void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
 {
-       if (!x86_pmu_initialized()) {
+       /* This API doesn't currently support enumerating hybrid PMUs. */
+       if (WARN_ON_ONCE(cpu_feature_enabled(X86_FEATURE_HYBRID_CPU)) ||
+           !x86_pmu_initialized()) {
                memset(cap, 0, sizeof(*cap));
                return;
        }
 
+       /*
+        * Note, hybrid CPU models get tracked as having hybrid PMUs even when
+        * all E-cores are disabled via BIOS.  When E-cores are disabled, the
+        * base PMU holds the correct number of counters for P-cores.
+        */
        cap->version            = x86_pmu.version;
-       /*
-        * KVM doesn't support the hybrid PMU yet.
-        * Return the common value in global x86_pmu,
-        * which available for all cores.
-        */
        cap->num_counters_gp    = x86_pmu.num_counters;
        cap->num_counters_fixed = x86_pmu.num_counters_fixed;
        cap->bit_width_gp       = x86_pmu.cntval_bits;
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index cdb91009701d..933165663703 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -165,15 +165,27 @@ static inline void kvm_init_pmu_capability(void)
 {
        bool is_intel = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL;
 
-       perf_get_x86_pmu_capability(&kvm_pmu_cap);
-
-        /*
-         * For Intel, only support guest architectural pmu
-         * on a host with architectural pmu.
-         */
-       if ((is_intel && !kvm_pmu_cap.version) || !kvm_pmu_cap.num_counters_gp)
+       /*
+        * Hybrid PMUs don't play nice with virtualization unless userspace
+        * pins vCPUs _and_ can enumerate accurate informations to the guest.
+        * Disable vPMU support for hybrid PMUs until KVM gains a way to let
+        * userspace opt into the dangers of hybrid vPMUs.
+       */
+       if (cpu_feature_enabled(X86_FEATURE_HYBRID_CPU))
                enable_pmu = false;
 
+       if (enable_pmu) {
+               perf_get_x86_pmu_capability(&kvm_pmu_cap);
+
+               /*
+                * For Intel, only support guest architectural pmu
+                * on a host with architectural pmu.
+                */
+               if ((is_intel && !kvm_pmu_cap.version) ||
+                   !kvm_pmu_cap.num_counters_gp)
+                       enable_pmu = false;
+       }
+
        if (!enable_pmu) {
                memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
                return;