Re: [PATCH v2] KVM: x86/pmu: Disable all vPMU features support on Intel hybrid CPUs

Like Xu <like.xu.linux@xxxxxxxxx> · Mon, 6 Feb 2023 16:52:15 +0800

On 4/2/2023 1:28 am, Sean Christopherson wrote:
On Fri, Feb 03, 2023, Like Xu wrote:
On 3/2/2023 2:06 am, Sean Christopherson wrote:
On Thu, Feb 02, 2023, Like Xu wrote:
On 1/2/2023 12:02 am, Sean Christopherson wrote:
The perf interface only provides host PMU capabilities and the logic for
choosing to disable (or enable) vPMU based on perf input should be left
in the KVM part so that subsequent development work can add most code
to the just KVM, which is very helpful for downstream users to upgrade
loadable KVM module rather than the entire core kernel.

My experience interacting with the perf subsystem has taught me that
perf change required from KVM should be made as small as possible.

I don't disagree, but I don't think that's relevant in this case.  Perf doesn't
provide the necessary bits for KVM to virtualize a hybrid PMU, so unless KVM is
somehow able to get away with enumerating a very stripped down vPMU, additional
modifications to perf_get_x86_pmu_capability() will be required.

What I care more about though is this ugliness in perf_get_x86_pmu_capability():

	/*
	 * KVM doesn't support the hybrid PMU yet.
	 * Return the common value in global x86_pmu,
	 * which available for all cores.

I would have expected w/ current code base, vpmu (excluding pebs and lbr, intel_pt)
to continue to work on any type of pCPU until you decide to disable them completely.

Didn't follow this.

My expectation is that, if a guest doesn't enable "PEBS, LBR and intel_pt",
and only has the most basic pmu conters (its number is the lesser number
of big and small cores supported), with some pmu_event_fileter allow list
mechanism, vPMU works regardless of the vcpu model and does not
require cpu pined. Any complaints from users on this usages ?


Moreover, the caller of perf_get_x86_pmu_capability() may be more than just KVM,
it may be technically ebpf helpers. The diff on comments from v1 can be applied to
this version (restrict KVM semantics), and it makes the status quo clearer
to KVM users.

In that case, eBPF is just as hosed, no?  And given that the only people that have
touched perf_get_x86_pmu_capability() in its 11+ years of existence are all KVM
people, I have a hard time believing there is meaningful use outside of KVM.

Some radical bpf programs will access the pmu directly, although this is
not  uncommon in upstream. KVM colleagues shouldn't need to care
about them, but at least don't mislead them.


	 */
	cap->num_counters_gp	= x86_pmu.num_counters;

I really don't want to leave that comment lying around as it's flat out wrong in
that it obviously doesn't address the other differences beyond the number of
counters.  And since there are dependencies on perf, my preference is to disable
PMU enumeration in perf specifically so that whoever takes on vPMU enabling is
forced to consider the perf side of things, and get buy in from the perf folks.

The perf_get_x86_pmu_capability() obviously needs to be revamped,
but until real effective KVM enabling work arrives, any inconsequential intrusion
into perf/core code will only lead to trivial system maintenance.

Trivial doesn't mean useless or unnecessary though.  IMO, there's value in capturing,
in code, that perf_get_x86_pmu_capability() doesn't properly support hybrid vPMUs.

That said, poking around perf, checking is_hybrid() is wrong.  This quirk suggests
that if E-cores are disabled via BIOS, (a) X86_FEATURE_HYBRID_CPU is _supposed_ to
be cleared, and (b) the base PMU will reflect the P-core PMU.  I.e. someone can
enable vPMU by disabling E-cores.

                 /*
                  * Quirk: For some Alder Lake machine, when all E-cores are disabled in
                  * a BIOS, the leaf 0xA will enumerate all counters of P-cores. However,
                  * the X86_FEATURE_HYBRID_CPU is still set. The above codes will

Sigh. Then what if E-cores are manually offline via "/.../cpu$/online" and then 
init kvm module ?
I suggest leaving these open issues to that enabling guy (or maybe it's still me).

                  * mistakenly add extra counters for P-cores. Correct the number of
                  * counters here.
                  */
                 if ((pmu->num_counters > 8) || (pmu->num_counters_fixed > 4)) {
                         pmu->num_counters = x86_pmu.num_counters;
                         pmu->num_counters_fixed = x86_pmu.num_counters_fixed;
                 }

Side topic, someone (*cough* Intel) should fix that, e.g. detect the scenario
during boot and manually clear X86_FEATURE_HYBRID_CPU.

Maybe they did it on purpose.


I'm also ok explicitly disabling support in KVM, but since we need to update
perf as well (that KVM comment needs to go), I don't see any reason not to also
update perf_get_x86_pmu_capability().

How about this?  Maybe split over two patches to separate the KVM and perf changes?

OK, applying your diff below or mine V2 as a KVM move is both fine to me. Just 
thanks.

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 85a63a41c471..d096b04bf80e 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2974,17 +2974,19 @@ unsigned long perf_misc_flags(struct pt_regs *regs)
  
  void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap)
  {
-       if (!x86_pmu_initialized()) {
+       /* This API doesn't currently support enumerating hybrid PMUs. */
+       if (WARN_ON_ONCE(cpu_feature_enabled(X86_FEATURE_HYBRID_CPU)) ||
+           !x86_pmu_initialized()) {
                 memset(cap, 0, sizeof(*cap));
                 return;
         }
  
+       /*
+        * Note, hybrid CPU models get tracked as having hybrid PMUs even when
+        * all E-cores are disabled via BIOS.  When E-cores are disabled, the
+        * base PMU holds the correct number of counters for P-cores.
+        */
         cap->version            = x86_pmu.version;
-       /*
-        * KVM doesn't support the hybrid PMU yet.
-        * Return the common value in global x86_pmu,
-        * which available for all cores.
-        */
         cap->num_counters_gp    = x86_pmu.num_counters;
         cap->num_counters_fixed = x86_pmu.num_counters_fixed;
         cap->bit_width_gp       = x86_pmu.cntval_bits;
diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h
index cdb91009701d..933165663703 100644
--- a/arch/x86/kvm/pmu.h
+++ b/arch/x86/kvm/pmu.h
@@ -165,15 +165,27 @@ static inline void kvm_init_pmu_capability(void)
  {
         bool is_intel = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL;
  
-       perf_get_x86_pmu_capability(&kvm_pmu_cap);
-
-        /*
-         * For Intel, only support guest architectural pmu
-         * on a host with architectural pmu.
-         */
-       if ((is_intel && !kvm_pmu_cap.version) || !kvm_pmu_cap.num_counters_gp)
+       /*
+        * Hybrid PMUs don't play nice with virtualization unless userspace
+        * pins vCPUs _and_ can enumerate accurate informations to the guest.
+        * Disable vPMU support for hybrid PMUs until KVM gains a way to let
+        * userspace opt into the dangers of hybrid vPMUs.
+       */
+       if (cpu_feature_enabled(X86_FEATURE_HYBRID_CPU))
                 enable_pmu = false;
  
+       if (enable_pmu) {
+               perf_get_x86_pmu_capability(&kvm_pmu_cap);
+
+               /*
+                * For Intel, only support guest architectural pmu
+                * on a host with architectural pmu.
+                */
+               if ((is_intel && !kvm_pmu_cap.version) ||
+                   !kvm_pmu_cap.num_counters_gp)
+                       enable_pmu = false;
+       }
+
         if (!enable_pmu) {
                 memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap));
                 return;