+Dapeng On Tue, Jan 14, 2025, kernel test robot wrote: > we fould the test failed on a Cooper Lake, not sure if this is expected. > below full report FYI. > > > kernel test robot noticed "kernel-selftests.kvm.pmu_counters_test.fail" on: > > commit: 7803339fa929387bbc66479532afbaf8cbebb41b ("KVM: selftests: Use data load to trigger LLC references/misses in Intel PMU") > https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master > > [test failed on linux-next/master 37136bf5c3a6f6b686d74f41837a6406bec6b7bc] > > in testcase: kernel-selftests > version: kernel-selftests-x86_64-7503345ac5f5-1_20241208 > with following parameters: > > group: kvm > > config: x86_64-rhel-9.4-kselftests > compiler: gcc-12 > test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory *sigh* This fails on our Skylake and Cascade Lake systems, but I only tested an Emerald Rapids. > # Testing fixed counters, PMU version 0, perf_caps = 2000 > # Testing arch events, PMU version 1, perf_caps = 0 > # ==== Test Assertion Failure ==== > # x86/pmu_counters_test.c:129: count >= (10 * 4 + 5) > # pid=6278 tid=6278 errno=4 - Interrupted system call > # 1 0x0000000000411281: assert_on_unhandled_exception at processor.c:625 > # 2 0x00000000004075d4: _vcpu_run at kvm_util.c:1652 > # 3 (inlined by) vcpu_run at kvm_util.c:1663 > # 4 0x0000000000402c5e: run_vcpu at pmu_counters_test.c:62 > # 5 0x0000000000402e4d: test_arch_events at pmu_counters_test.c:315 > # 6 0x0000000000402663: test_arch_events at pmu_counters_test.c:304 > # 7 (inlined by) test_intel_counters at pmu_counters_test.c:609 > # 8 (inlined by) main at pmu_counters_test.c:642 > # 9 0x00007f3b134f9249: ?? ??:0 > # 10 0x00007f3b134f9304: ?? ??:0 > # 11 0x0000000000402900: _start at ??:? > # count >= NUM_INSNS_RETIRED The failure is on top-down slots. I modified the assert to actually print the count (I'll make sure to post a patch regardless of where this goes), and based on the count for failing vs. passing, I'm pretty sure the issue is not the extra instruction, but instead is due to changing the target of the CLFUSH from the address of the code to the address of kvm_pmu_version. However, I think the blame lies with the assertion itself, i.e. with commit 4a447b135e45 ("KVM: selftests: Test top-down slots event in x86's pmu_counters_test"). Either that or top-down slots is broken on the Lakes. By my rudimentary measurements, tying the number of available slots to the number of instructions *retired* is fundamentally flawed. E.g. on the Lakes (SKX is more or less identical to CLX), omitting the CLFLUSHOPT entirely results in *more* slots being available throughout the lifetime of the measured section. My best guess is that flushing the cache line use for the data load causes the backend to saturate its slots with prefetching data, and as a result the number of slots that are available goes down. CLFLUSHOPT . | CLFLUSHOPT [%m] | NOP CLX 350-100 | 20-60[*] | 135-150 SPR 49000-57000 | 32500-41000 | 6760-6830 [*] CLX had a few outliers in the 200-400 range, but the majority of runs were in the 20-60 range. Reading through more (and more and more) of the TMA documentation, I don't think we can assume anything about the number of available slots, beyond a very basic assertion that it's practically impossible for there to never be an available slot. IIUC, retiring an instruction does NOT require an available slot, rather it requires the opposite: an occupied slot for the uop(s). I'm mildly curious as to why the counts for SPR are orders of magnitude higher that CLX (simple accounting differences?), but I don't think it changes anything in the test itself. Unless someone has a better idea, my plan is to post a patch to assert that the top-down slots count is non-zero, not that it's >= instructions retired. E.g. diff --git a/tools/testing/selftests/kvm/x86/pmu_counters_test.c b/tools/testing/selftests/kvm/x86/pmu_counters_test.c index accd7ecd3e5f..21acedcd46cd 100644 --- a/tools/testing/selftests/kvm/x86/pmu_counters_test.c +++ b/tools/testing/selftests/kvm/x86/pmu_counters_test.c @@ -123,10 +123,8 @@ static void guest_assert_event_count(uint8_t idx, fallthrough; case INTEL_ARCH_CPU_CYCLES_INDEX: case INTEL_ARCH_REFERENCE_CYCLES_INDEX: - GUEST_ASSERT_NE(count, 0); - break; case INTEL_ARCH_TOPDOWN_SLOTS_INDEX: - GUEST_ASSERT(count >= NUM_INSNS_RETIRED); + GUEST_ASSERT_NE(count, 0); break; default: break;