On Wed, Sep 18, 2024, Colton Lewis wrote: > Test events on core counters by iterating through every combination of > events in amd_pmu_zen_events with every core counter. > > For each combination, calculate the appropriate register addresses for > the event selection/control register and the counter register. The > base addresses and layout schemes change depending on whether we have > the CoreExt feature. > > To do the testing, reuse GUEST_TEST_EVENT to run a standard known > workload. Decouple it from guest_assert_event_count (now > guest_assert_intel_event_count) to generalize to AMD. > > Then assert the most specific detail that can be reasonably known > about the counter result. Exact count is defined and known for some > events and for other events merely asserted to be nonzero. > > Note on exact counts: AMD counts one more branch than Intel for the > same workload. Though I can't confirm a reason, the only thing it > could be is the boundary of the loop instruction being counted > differently. Presumably, when the counter reaches 0 and execution > continues to the next instruction, AMD counts this as a branch and > Intel doesn't IIRC, VMRUN is counted as a branch instruction for the guest. Assuming my memory is correct, that means this test is going to be flaky as an asynchronous exit, e.g. due to a host IRQ, during the measurement loop will inflate the count. I'm not entirely sure what to do about that :-( > +static void __guest_test_core_event(uint8_t event_idx, uint8_t counter_idx) > +{ > + /* One fortunate area of actual compatibility! This register /* * This is the proper format for multi-line comments. We are not the * crazy net/ folks. */ > + * layout is the same for both AMD and Intel. It's not, actually. There are differences in the layout, it just so happens that the differences don't throw a wrench in things. The comments in tools/testing/selftests/kvm/include/x86_64/pmu.h document this fairly well, I don't see any reason to have a comment here. > + */ > + uint64_t eventsel = ARCH_PERFMON_EVENTSEL_OS | > + ARCH_PERFMON_EVENTSEL_ENABLE | > + amd_pmu_zen_events[event_idx]; Align the indentation. uint64_t eventsel = ARCH_PERFMON_EVENTSEL_OS | ARCH_PERFMON_EVENTSEL_ENABLE | amd_pmu_zen_events[event_idx]; > + bool core_ext = this_cpu_has(X86_FEATURE_PERF_CTR_EXT_CORE); > + uint64_t esel_msr_base = core_ext ? MSR_F15H_PERF_CTL : MSR_K7_EVNTSEL0; > + uint64_t cnt_msr_base = core_ext ? MSR_F15H_PERF_CTR : MSR_K7_PERFCTR0; > + uint64_t msr_step = core_ext ? 2 : 1; > + uint64_t esel_msr = esel_msr_base + msr_step * counter_idx; > + uint64_t cnt_msr = cnt_msr_base + msr_step * counter_idx; This pattern of code is copy+pasted in three functions. Please add macros and/or helpers to consolidate things. These should also be uint32_t, not 64. It's a bit evil, but one approach would be to add a macro to iterate over all PMU counters. Eating the VM-Exit for the CPUID to get X86_FEATURE_PERF_CTR_EXT_CORE each time is unfortunate, but I doubt/hope it's not problematic in practice. If the cost is meaningful, we could figure out a way to cache the info, e.g. something awful like this might work: /* Note, this relies on guest state being recreated between each test. */ static int has_perfctr_core = -1; if (has_perfctr_core == -1) has_perfctr_core = this_cpu_has(X86_FEATURE_PERFCTR_CORE); if (has_perfctr_core) { static bool get_pmu_counter_msrs(int idx, uint32_t *eventsel, uint32_t *pmc) { if (this_cpu_has(X86_FEATURE_PERFCTR_CORE)) { *eventsel = MSR_F15H_PERF_CTL + idx * 2; *pmc = MSR_F15H_PERF_CTR + idx * 2; } else { *eventsel = MSR_K7_EVNTSEL0 + idx; *pmc = MSR_K7_PERFCTR0 + idx; } return true; } #define for_each_pmu_counter(_i, _nr_counters, _eventsel, _pmc) \ for (_i = 0; i < _nr_counters; _i++) \ if (get_pmu_counter_msrs(_i, &_eventsel, _pmc)) \ static void guest_test_core_events(void) { uint8_t nr_counters = guest_nr_core_counters(); uint32_t eventsel_msr, pmc_msr; int i, j; for (i = 0; i < NR_AMD_ZEN_EVENTS; i++) { for_each_pmu_counter(j, nr_counters, eventsel_msr, pmc_msr) { uint64_t eventsel = ARCH_PERFMON_EVENTSEL_OS | ARCH_PERFMON_EVENTSEL_ENABLE | amd_pmu_zen_events[event_idx]; GUEST_TEST_EVENT(pmc_msr, eventsel_msr, eventsel, ""); guest_assert_amd_event_count(i, j, pmc_msr); if (!is_forced_emulation_enabled) continue; GUEST_TEST_EVENT(pmc_msr, eventsel_msr, eventsel, KVM_FEP); guest_assert_amd_event_count(i, j, pmc_msr); } } }