On Thu, Oct 26, 2023, Sean Christopherson wrote: > On Thu, Oct 26, 2023, Mingwei Zhang wrote: > > > +static bool pmu_is_intel_event_stable(uint8_t idx) > > > +{ > > > + switch (idx) { > > > + case INTEL_ARCH_CPU_CYCLES: > > > + case INTEL_ARCH_INSTRUCTIONS_RETIRED: > > > + case INTEL_ARCH_REFERENCE_CYCLES: > > > + case INTEL_ARCH_BRANCHES_RETIRED: > > > + return true; > > > + default: > > > + return false; > > > + } > > > +} > > > > Brief explanation on why other events are not stable please. Since there > > are only a few architecture events, maybe listing all of them with > > explanation in comments would work better. > > Heh, I've already rewritten this logic to make > > > > > + > > > +static void guest_measure_pmu_v1(struct kvm_x86_pmu_feature event, > > > + uint32_t counter_msr, uint32_t nr_gp_counters) > > > +{ > > > + uint8_t idx = event.f.bit; > > > + unsigned int i; > > > + > > > + for (i = 0; i < nr_gp_counters; i++) { > > > + wrmsr(counter_msr + i, 0); > > > + wrmsr(MSR_P6_EVNTSEL0 + i, ARCH_PERFMON_EVENTSEL_OS | > > > + ARCH_PERFMON_EVENTSEL_ENABLE | intel_pmu_arch_events[idx]); > > > + __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES})); > > > > Some comment might be needed for readability. Abuptly inserting inline > > assembly code in C destroys the readability. > > > > I wonder do we need add 'clobber' here for the above line, since it > > takes away ecx? > > It's already there. You can't directly clobber a register that is used as an > input constraint. The workaround is to make the register both an input and an > output, hense the "+c" in the outputs section instead of just "c" in the inputs > section. The extra bit of cleverness is to use an intermediate anonymous variable > so that NUM_BRANCHES can effectively be passed in (#defines won't work as output > constraints). > > > Also, I wonder if we need to disable IRQ here? This code might be > > intercepted and resumed. If so, then the test will get a different > > number? > > This is guest code, disabling IRQs is pointless. There are no guest virtual IRQs, > guarding aginst host IRQs is impossible, unnecessary, and actualy undesirable, > i.e. the guest vPMU shouldn't be counting host instructions and whatnot. > > > > + > > > + if (pmu_is_intel_event_stable(idx)) > > > + GUEST_ASSERT_EQ(this_pmu_has(event), !!_rdpmc(i)); > > > > Okay, just the counter value is non-zero means we pass the test ?! > > FWIW, I've updated > > > hmm, I wonder other than IRQ stuff, what else may affect the result? NMI > > watchdog or what? > > This is the beauty of selftests. There _so_ simple that there are very few > surprises. E.g. there are no events of any kind unless the test explicitly > generates them. The downside is that doing anything complex in selftests requires > writing a fair bit of code. Understood, so we could support precise matching. > > > > + > > > + wrmsr(MSR_P6_EVNTSEL0 + i, ARCH_PERFMON_EVENTSEL_OS | > > > + !ARCH_PERFMON_EVENTSEL_ENABLE | > > > + intel_pmu_arch_events[idx]); > > > + wrmsr(counter_msr + i, 0); > > > + __asm__ __volatile__("loop ." : "+c"((int){NUM_BRANCHES})); > > ditto for readability. Please consider using a macro to avoid repeated > > explanation. > > Heh, already did this too. Though I'm not entirely sure it's more readable. It's > definitely more precise and featured :-) > Oh dear, this is challenging to my rusty inline assembly skills :) > #define GUEST_MEASURE_EVENT(_msr, _value, clflush, FEP) \ > do { \ > __asm__ __volatile__("wrmsr\n\t" \ > clflush "\n\t" \ > "mfence\n\t" \ > "1: mov $" __stringify(NUM_BRANCHES) ", %%ecx\n\t" \ > FEP "loop .\n\t" \ > FEP "mov %%edi, %%ecx\n\t" \ > FEP "xor %%eax, %%eax\n\t" \ > FEP "xor %%edx, %%edx\n\t" \ > "wrmsr\n\t" \ > : "+c"((int){_msr}) \ isn't it NUM_BRANCHES? > : "a"((uint32_t)_value), "d"(_value >> 32), \ > "D"(_msr) \ > ); \ > } while (0) > do we need this label '1:' in the above code? It does not seems to be used anywhere within the code. why is clflush needed here? > > > > +int main(int argc, char *argv[]) > > > +{ > > > + TEST_REQUIRE(get_kvm_param_bool("enable_pmu")); > > > + > > > + TEST_REQUIRE(host_cpu_is_intel); > > > + TEST_REQUIRE(kvm_cpu_has_p(X86_PROPERTY_PMU_VERSION)); > > > + TEST_REQUIRE(kvm_cpu_property(X86_PROPERTY_PMU_VERSION) > 0); > > > + TEST_REQUIRE(kvm_cpu_has(X86_FEATURE_PDCM)); > > > > hmm, this means we cannot run this in nested if X86_FEATURE_PDCM is > > missing. It only affects full-width counter, right? > > Ah, yeah, good call. It won't be too much trouble to have the test play nice > with !PDCM.