Re: [PATCH v5 08/13] KVM: selftests: Test Intel PMU architectural events on gp counters

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Oct 23, 2023, Sean Christopherson wrote:
> +static void guest_measure_pmu_v1(struct kvm_x86_pmu_feature event,
> +				 uint32_t counter_msr, uint32_t nr_gp_counters)
> +{
> +	uint8_t idx = event.f.bit;
> +	unsigned int i;
> +
> +	for (i = 0; i < nr_gp_counters; i++) {
> +		wrmsr(counter_msr + i, 0);
> +		wrmsr(MSR_P6_EVNTSEL0 + i, ARCH_PERFMON_EVENTSEL_OS |
> +		      ARCH_PERFMON_EVENTSEL_ENABLE | intel_pmu_arch_events[idx]);
> +		__asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES}));
> +
> +		if (pmu_is_intel_event_stable(idx))
> +			GUEST_ASSERT_EQ(this_pmu_has(event), !!_rdpmc(i));
> +
> +		wrmsr(MSR_P6_EVNTSEL0 + i, ARCH_PERFMON_EVENTSEL_OS |
> +		      !ARCH_PERFMON_EVENTSEL_ENABLE |
> +		      intel_pmu_arch_events[idx]);
> +		wrmsr(counter_msr + i, 0);
> +		__asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES}));
> +
> +		if (pmu_is_intel_event_stable(idx))
> +			GUEST_ASSERT(!_rdpmc(i));
> +	}
> +
> +	GUEST_DONE();
> +}
> +
> +static void guest_measure_loop(uint8_t idx)
> +{
> +	const struct {
> +		struct kvm_x86_pmu_feature gp_event;
> +	} intel_event_to_feature[] = {
> +		[INTEL_ARCH_CPU_CYCLES]		   = { X86_PMU_FEATURE_CPU_CYCLES },
> +		[INTEL_ARCH_INSTRUCTIONS_RETIRED]  = { X86_PMU_FEATURE_INSNS_RETIRED },
> +		[INTEL_ARCH_REFERENCE_CYCLES]	   = { X86_PMU_FEATURE_REFERENCE_CYCLES },
> +		[INTEL_ARCH_LLC_REFERENCES]	   = { X86_PMU_FEATURE_LLC_REFERENCES },
> +		[INTEL_ARCH_LLC_MISSES]		   = { X86_PMU_FEATURE_LLC_MISSES },
> +		[INTEL_ARCH_BRANCHES_RETIRED]	   = { X86_PMU_FEATURE_BRANCH_INSNS_RETIRED },
> +		[INTEL_ARCH_BRANCHES_MISPREDICTED] = { X86_PMU_FEATURE_BRANCHES_MISPREDICTED },
> +	};
> +
> +	uint32_t nr_gp_counters = this_cpu_property(X86_PROPERTY_PMU_NR_GP_COUNTERS);
> +	uint32_t pmu_version = this_cpu_property(X86_PROPERTY_PMU_VERSION);
> +	struct kvm_x86_pmu_feature gp_event;
> +	uint32_t counter_msr;
> +	unsigned int i;
> +
> +	if (rdmsr(MSR_IA32_PERF_CAPABILITIES) & PMU_CAP_FW_WRITES)
> +		counter_msr = MSR_IA32_PMC0;
> +	else
> +		counter_msr = MSR_IA32_PERFCTR0;
> +
> +	gp_event = intel_event_to_feature[idx].gp_event;
> +	TEST_ASSERT_EQ(idx, gp_event.f.bit);
> +
> +	if (pmu_version < 2) {
> +		guest_measure_pmu_v1(gp_event, counter_msr, nr_gp_counters);

Looking at this again, testing guest PMU version 1 is practically impossible
because this testcase doesn't force the guest PMU version.  I.e. unless I'm
missing something, this requires old hardware or running in a VM with its PMU
forced to '1'.

And if all subtests use similar inputs, the common configuration can be shoved
into pmu_vm_create_with_one_vcpu().

It's easy enough to fold test_intel_arch_events() into test_intel_counters(),
which will also provide coverage for running with full-width writes enabled.  The
only downside is that the total runtime will be longer.

> +static void test_arch_events_cpuid(uint8_t i, uint8_t j, uint8_t idx)
> +{
> +	uint8_t arch_events_unavailable_mask = BIT_ULL(j);
> +	uint8_t arch_events_bitmap_size = BIT_ULL(i);
> +	struct kvm_vcpu *vcpu;
> +	struct kvm_vm *vm;
> +
> +	vm = pmu_vm_create_with_one_vcpu(&vcpu, guest_measure_loop);
> +
> +	vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_EBX_BIT_VECTOR_LENGTH,
> +				arch_events_bitmap_size);
> +	vcpu_set_cpuid_property(vcpu, X86_PROPERTY_PMU_EVENTS_MASK,
> +				arch_events_unavailable_mask);
> +
> +	vcpu_args_set(vcpu, 1, idx);
> +
> +	run_vcpu(vcpu);
> +
> +	kvm_vm_free(vm);
> +}
> +
> +static void test_intel_arch_events(void)
> +{
> +	uint8_t idx, i, j;
> +
> +	for (idx = 0; idx < NR_INTEL_ARCH_EVENTS; idx++) {

There's no need to iterate over each event in the host, we can simply add a wrapper
for guest_measure_loop() in the guest.  That'll be slightly faster since it won't
require creating and destroying a VM for every event.

> +		/*
> +		 * A brute force iteration of all combinations of values is
> +		 * likely to exhaust the limit of the single-threaded thread
> +		 * fd nums, so it's test by iterating through all valid
> +		 * single-bit values.
> +		 */
> +		for (i = 0; i < NR_INTEL_ARCH_EVENTS; i++) {

This is flawed/odd.  'i' becomes arch_events_bitmap_size, i.e. it's a length,
but the length is computed byt BIT(i).  That's nonsensical and will eventually
result in undefined behavior.  Oof, that'll actually happen sooner than later
because arch_events_bitmap_size is only a single byte, i.e. when the number of
events hits 9, this will try to shove 256 into an 8-bit variable.

The more correct approach would be to pass in 0..NR_INTEL_ARCH_EVENTS inclusive
as the size.  But I think we should actually test 0..length+1, where "length" is
the max of the native length and NR_INTEL_ARCH_EVENTS, i.e. we should verify KVM
KVM handles a size larger than the native length.

> +			for (j = 0; j < NR_INTEL_ARCH_EVENTS; j++)
> +				test_arch_events_cpuid(i, j, idx);

And here, I think it makes sense to brute force all possible values for at least
one configuration.  There aren't actually _that_ many values, e.g. currently it's
64 (I think).  E.g. test the native PMU version with the "full" length, and then
test single bits with varying lengths.

I'll send a v6 later this week.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux