On Mon, Oct 23, 2023, Sean Christopherson wrote: > +static void guest_measure_pmu_v1(struct kvm_x86_pmu_feature event, > + uint32_t counter_msr, uint32_t nr_gp_counters) > +{ > + uint8_t idx = event.f.bit; > + unsigned int i; > + > + for (i = 0; i < nr_gp_counters; i++) { > + wrmsr(counter_msr + i, 0); > + wrmsr(MSR_P6_EVNTSEL0 + i, ARCH_PERFMON_EVENTSEL_OS | > + ARCH_PERFMON_EVENTSEL_ENABLE | intel_pmu_arch_events[idx]); > + __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES})); > + > + if (pmu_is_intel_event_stable(idx)) > + GUEST_ASSERT_EQ(this_pmu_has(event), !!_rdpmc(i)); > + > + wrmsr(MSR_P6_EVNTSEL0 + i, ARCH_PERFMON_EVENTSEL_OS | > + !ARCH_PERFMON_EVENTSEL_ENABLE | > + intel_pmu_arch_events[idx]); > + wrmsr(counter_msr + i, 0); > + __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES})); > + > + if (pmu_is_intel_event_stable(idx)) > + GUEST_ASSERT(!_rdpmc(i)); > + } > + > + GUEST_DONE(); > +} > + > +static void guest_measure_loop(uint8_t idx) > +{ > + const struct { > + struct kvm_x86_pmu_feature gp_event; > + } intel_event_to_feature[] = { > + [INTEL_ARCH_CPU_CYCLES] = { X86_PMU_FEATURE_CPU_CYCLES }, > + [INTEL_ARCH_INSTRUCTIONS_RETIRED] = { X86_PMU_FEATURE_INSNS_RETIRED }, > + [INTEL_ARCH_REFERENCE_CYCLES] = { X86_PMU_FEATURE_REFERENCE_CYCLES }, > + [INTEL_ARCH_LLC_REFERENCES] = { X86_PMU_FEATURE_LLC_REFERENCES }, > + [INTEL_ARCH_LLC_MISSES] = { X86_PMU_FEATURE_LLC_MISSES }, > + [INTEL_ARCH_BRANCHES_RETIRED] = { X86_PMU_FEATURE_BRANCH_INSNS_RETIRED }, > + [INTEL_ARCH_BRANCHES_MISPREDICTED] = { X86_PMU_FEATURE_BRANCHES_MISPREDICTED }, > + }; > + > + uint32_t nr_gp_counters = this_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS); > + uint32_t pmu_version = this_cpu_property(X86_PROPERTY_PMU_VERSION); > + struct kvm_x86_pmu_feature gp_event; > + uint32_t counter_msr; > + unsigned int i; > + > + if (rdmsr(MSR_IA32_PERF_CAPABILITIES) & PMU_CAP_FW_WRITES) > + counter_msr = MSR_IA32_PMC0; > + else > + counter_msr = MSR_IA32_PERFCTR0; > + > + gp_event = intel_event_to_feature[idx].gp_event; > + TEST_ASSERT_EQ(idx, gp_event.f.bit); > + > + if (pmu_version < 2) { > + guest_measure_pmu_v1(gp_event, counter_msr, nr_gp_counters); Looking at this again, testing guest PMU version 1 is practically impossible because this testcase doesn't force the guest PMU version. I.e. unless I'm missing something, this requires old hardware or running in a VM with its PMU forced to '1'. And if all subtests use similar inputs, the common configuration can be shoved into pmu_vm_create_with_one_vcpu(). It's easy enough to fold test_intel_arch_events() into test_intel_counters(), which will also provide coverage for running with full-width writes enabled. The only downside is that the total runtime will be longer. > +static void test_arch_events_cpuid(uint8_t i, uint8_t j, uint8_t idx) > +{ > + uint8_t arch_events_unavailable_mask = BIT_ULL(j); > + uint8_t arch_events_bitmap_size = BIT_ULL(i); > + struct kvm_vcpu *vcpu; > + struct kvm_vm *vm; > + > + vm = pmu_vm_create_with_one_vcpu(&vcpu, guest_measure_loop); > + > + vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH, > + arch_events_bitmap_size); > + vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_EVENTS_MASK, > + arch_events_unavailable_mask); > + > + vcpu_args_set(vcpu, 1, idx); > + > + run_vcpu(vcpu); > + > + kvm_vm_free(vm); > +} > + > +static void test_intel_arch_events(void) > +{ > + uint8_t idx, i, j; > + > + for (idx = 0; idx < NR_INTEL_ARCH_EVENTS; idx++) { There's no need to iterate over each event in the host, we can simply add a wrapper for guest_measure_loop() in the guest. That'll be slightly faster since it won't require creating and destroying a VM for every event. > + /* > + * A brute force iteration of all combinations of values is > + * likely to exhaust the limit of the single-threaded thread > + * fd nums, so it's test by iterating through all valid > + * single-bit values. > + */ > + for (i = 0; i < NR_INTEL_ARCH_EVENTS; i++) { This is flawed/odd. 'i' becomes arch_events_bitmap_size, i.e. it's a length, but the length is computed byt BIT(i). That's nonsensical and will eventually result in undefined behavior. Oof, that'll actually happen sooner than later because arch_events_bitmap_size is only a single byte, i.e. when the number of events hits 9, this will try to shove 256 into an 8-bit variable. The more correct approach would be to pass in 0..NR_INTEL_ARCH_EVENTS inclusive as the size. But I think we should actually test 0..length+1, where "length" is the max of the native length and NR_INTEL_ARCH_EVENTS, i.e. we should verify KVM KVM handles a size larger than the native length. > + for (j = 0; j < NR_INTEL_ARCH_EVENTS; j++) > + test_arch_events_cpuid(i, j, idx); And here, I think it makes sense to brute force all possible values for at least one configuration. There aren't actually _that_ many values, e.g. currently it's 64 (I think). E.g. test the native PMU version with the "full" length, and then test single bits with varying lengths. I'll send a v6 later this week.