On Fri, Feb 03, 2023, Like Xu wrote: > On 3/2/2023 2:06 am, Sean Christopherson wrote: > > On Thu, Feb 02, 2023, Like Xu wrote: > > > On 1/2/2023 12:02 am, Sean Christopherson wrote: > > > The perf interface only provides host PMU capabilities and the logic for > > > choosing to disable (or enable) vPMU based on perf input should be left > > > in the KVM part so that subsequent development work can add most code > > > to the just KVM, which is very helpful for downstream users to upgrade > > > loadable KVM module rather than the entire core kernel. > > > > > > My experience interacting with the perf subsystem has taught me that > > > perf change required from KVM should be made as small as possible. > > > > I don't disagree, but I don't think that's relevant in this case. Perf doesn't > > provide the necessary bits for KVM to virtualize a hybrid PMU, so unless KVM is > > somehow able to get away with enumerating a very stripped down vPMU, additional > > modifications to perf_get_x86_pmu_capability() will be required. > > > > What I care more about though is this ugliness in perf_get_x86_pmu_capability(): > > > > /* > > * KVM doesn't support the hybrid PMU yet. > > * Return the common value in global x86_pmu, > > * which available for all cores. > > I would have expected w/ current code base, vpmu (excluding pebs and lbr, intel_pt) > to continue to work on any type of pCPU until you decide to disable them completely. Didn't follow this. > Moreover, the caller of perf_get_x86_pmu_capability() may be more than just KVM, > it may be technically ebpf helpers. The diff on comments from v1 can be applied to > this version (restrict KVM semantics), and it makes the status quo clearer > to KVM users. In that case, eBPF is just as hosed, no? And given that the only people that have touched perf_get_x86_pmu_capability() in its 11+ years of existence are all KVM people, I have a hard time believing there is meaningful use outside of KVM. > > */ > > cap->num_counters_gp = x86_pmu.num_counters; > > > > I really don't want to leave that comment lying around as it's flat out wrong in > > that it obviously doesn't address the other differences beyond the number of > > counters. And since there are dependencies on perf, my preference is to disable > > PMU enumeration in perf specifically so that whoever takes on vPMU enabling is > > forced to consider the perf side of things, and get buy in from the perf folks. > > The perf_get_x86_pmu_capability() obviously needs to be revamped, > but until real effective KVM enabling work arrives, any inconsequential intrusion > into perf/core code will only lead to trivial system maintenance. Trivial doesn't mean useless or unnecessary though. IMO, there's value in capturing, in code, that perf_get_x86_pmu_capability() doesn't properly support hybrid vPMUs. That said, poking around perf, checking is_hybrid() is wrong. This quirk suggests that if E-cores are disabled via BIOS, (a) X86_FEATURE_HYBRID_CPU is _supposed_ to be cleared, and (b) the base PMU will reflect the P-core PMU. I.e. someone can enable vPMU by disabling E-cores. /* * Quirk: For some Alder Lake machine, when all E-cores are disabled in * a BIOS, the leaf 0xA will enumerate all counters of P-cores. However, * the X86_FEATURE_HYBRID_CPU is still set. The above codes will * mistakenly add extra counters for P-cores. Correct the number of * counters here. */ if ((pmu->num_counters > 8) || (pmu->num_counters_fixed > 4)) { pmu->num_counters = x86_pmu.num_counters; pmu->num_counters_fixed = x86_pmu.num_counters_fixed; } Side topic, someone (*cough* Intel) should fix that, e.g. detect the scenario during boot and manually clear X86_FEATURE_HYBRID_CPU. I'm also ok explicitly disabling support in KVM, but since we need to update perf as well (that KVM comment needs to go), I don't see any reason not to also update perf_get_x86_pmu_capability(). How about this? Maybe split over two patches to separate the KVM and perf changes? diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c index 85a63a41c471..d096b04bf80e 100644 --- a/arch/x86/events/core.c +++ b/arch/x86/events/core.c @@ -2974,17 +2974,19 @@ unsigned long perf_misc_flags(struct pt_regs *regs) void perf_get_x86_pmu_capability(struct x86_pmu_capability *cap) { - if (!x86_pmu_initialized()) { + /* This API doesn't currently support enumerating hybrid PMUs. */ + if (WARN_ON_ONCE(cpu_feature_enabled(X86_FEATURE_HYBRID_CPU)) || + !x86_pmu_initialized()) { memset(cap, 0, sizeof(*cap)); return; } + /* + * Note, hybrid CPU models get tracked as having hybrid PMUs even when + * all E-cores are disabled via BIOS. When E-cores are disabled, the + * base PMU holds the correct number of counters for P-cores. + */ cap->version = x86_pmu.version; - /* - * KVM doesn't support the hybrid PMU yet. - * Return the common value in global x86_pmu, - * which available for all cores. - */ cap->num_counters_gp = x86_pmu.num_counters; cap->num_counters_fixed = x86_pmu.num_counters_fixed; cap->bit_width_gp = x86_pmu.cntval_bits; diff --git a/arch/x86/kvm/pmu.h b/arch/x86/kvm/pmu.h index cdb91009701d..933165663703 100644 --- a/arch/x86/kvm/pmu.h +++ b/arch/x86/kvm/pmu.h @@ -165,15 +165,27 @@ static inline void kvm_init_pmu_capability(void) { bool is_intel = boot_cpu_data.x86_vendor == X86_VENDOR_INTEL; - perf_get_x86_pmu_capability(&kvm_pmu_cap); - - /* - * For Intel, only support guest architectural pmu - * on a host with architectural pmu. - */ - if ((is_intel && !kvm_pmu_cap.version) || !kvm_pmu_cap.num_counters_gp) + /* + * Hybrid PMUs don't play nice with virtualization unless userspace + * pins vCPUs _and_ can enumerate accurate informations to the guest. + * Disable vPMU support for hybrid PMUs until KVM gains a way to let + * userspace opt into the dangers of hybrid vPMUs. + */ + if (cpu_feature_enabled(X86_FEATURE_HYBRID_CPU)) enable_pmu = false; + if (enable_pmu) { + perf_get_x86_pmu_capability(&kvm_pmu_cap); + + /* + * For Intel, only support guest architectural pmu + * on a host with architectural pmu. + */ + if ((is_intel && !kvm_pmu_cap.version) || + !kvm_pmu_cap.num_counters_gp) + enable_pmu = false; + } + if (!enable_pmu) { memset(&kvm_pmu_cap, 0, sizeof(kvm_pmu_cap)); return;