[PATCH RFC 0/8] KVM: x86/pmu: Enable Fixed Counter3 and Topdown Perf Metrics

Like Xu <like.xu.linux@xxxxxxxxx> · Mon, 12 Dec 2022 20:58:36 +0800

Hi,

The Ice Lake core PMU provides built-in support for Top-down u-arch
Analysis (TMA) method level 1 metrics. These metrics are always available
to cross-validate performance observations, freeing general purpose
counters to count other events in high counter utilization scenarios.
For more details about the method, refer to Top-Down Analysis Method
chapter (Appendix B.1) of the Intel® 64 and IA-32 Architectures
Optimization Reference Manual. (SDM 19.3.9.3 Performance Metrics)

This patchset enables Intel Guest Topdow for KVM-based guests. Its basic
enabling framework remains unchanged, a perf_metric msr is introduced,
a group (rather than one) of perf_events is created in KVM by binding to
fiexed counter3 to obtain hardware resources, and the guest value of
perf_metric msr is assembled based on the count of grouped perf_events.

On KVM, patches 0004/5/6 may be reviewd independently if KVM only
enable fixed counter3 as normal slot event for count and sampling. 
Patch 7 updates the infrastructure for creating grouped events in KVM,
and patch 8 uses group events to emulate guest MSR_PERF_METRICS.

On Perf, Patches 0001-0003 are awaiting review for tip/perf/core, and
could be accepted separately if they make sense. TBH, I don't think our
perf/core is fully prepared to support kernel space grouped counters,
considering comments around perf_enable_diasable(). But after much
exploration on my part, this is probably the most promising way to get
KVM to create slots plus metrics events. If the addition of *group_leader
messes things up, please shout at me on your needs.

More details in each commit messages may answer code-related questions.

A classic perf tool usage on a linux guest is as follows:
$ perf stat --topdown --td-level=1 -I1000 --no-metric-only sleep 1
#           time             counts unit events
     1.000548528         34,505,682      slots
     1.000548528         14,208,222      topdown-retiring                 #     41.5% Retiring
     1.000548528          1,623,796      topdown-bad-spec                 #      4.7% Bad Speculation
     1.000548528         14,614,171      topdown-fe-bound                 #     42.7% Frontend Bound
     1.000548528          3,788,859      topdown-be-bound                 #     11.1% Backend Bound

Related KUT will follow if there are no obstructive negative comments.

Nit, pre-patches includes:
https://lore.kernel.org/kvm/20221207071506.15733-2-likexu@xxxxxxxxxxx/
https://lore.kernel.org/kvm/20221205122048.16023-1-likexu@xxxxxxxxxxx/

Please feel free to comment and share your feedback.

Thanks,

Like Xu (8):
  perf/core: Add *group_leader to perf_event_create_kernel_counter()
  perf: x86/core: Expose the available number of the Topdown metrics
  perf: x86/core: Snyc PERF_METRICS bit together with fixed counter3
  KVM: x86/pmu: Add Intel CPUID-hinted Topdown Slots event
  KVM: x86/pmu: Add kernel-defined slots event to enable Fixed Counter3
  KVM: x86/pmu: properly use INTEL_PMC_FIXED_RDPMC_BASE macro
  KVM: x86/pmu: Use flex *event arrays to implement grouped events
  KVM: x86/pmu: Add MSR_PERF_METRICS MSR emulation to enable Topdown

 arch/arm64/kvm/pmu-emul.c                 |   4 +-
 arch/x86/events/core.c                    |   1 +
 arch/x86/events/intel/core.c              |   3 +
 arch/x86/include/asm/kvm_host.h           |  14 +-
 arch/x86/include/asm/msr-index.h          |   1 +
 arch/x86/include/asm/perf_event.h         |   1 +
 arch/x86/kernel/cpu/resctrl/pseudo_lock.c |   4 +-
 arch/x86/kvm/pmu.c                        | 149 ++++++++++++++++++++--
 arch/x86/kvm/pmu.h                        |  31 +++--
 arch/x86/kvm/svm/pmu.c                    |   1 +
 arch/x86/kvm/vmx/pmu_intel.c              |  53 +++++++-
 arch/x86/kvm/vmx/vmx.c                    |   3 +
 arch/x86/kvm/x86.c                        |   9 +-
 include/linux/perf_event.h                |   1 +
 kernel/events/core.c                      |   4 +-
 kernel/events/hw_breakpoint.c             |   4 +-
 kernel/events/hw_breakpoint_test.c        |   2 +-
 kernel/watchdog_hld.c                     |   2 +-
 18 files changed, 239 insertions(+), 48 deletions(-)

-- 
2.38.2