On 1/15/2025 3:47 AM, Sean Christopherson wrote: > +Dapeng > > On Tue, Jan 14, 2025, kernel test robot wrote: >> we fould the test failed on a Cooper Lake, not sure if this is expected. >> below full report FYI. >> >> >> kernel test robot noticed "kernel-selftests.kvm.pmu_counters_test.fail" on: >> >> commit: 7803339fa929387bbc66479532afbaf8cbebb41b ("KVM: selftests: Use data load to trigger LLC references/misses in Intel PMU") >> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master >> >> [test failed on linux-next/master 37136bf5c3a6f6b686d74f41837a6406bec6b7bc] >> >> in testcase: kernel-selftests >> version: kernel-selftests-x86_64-7503345ac5f5-1_20241208 >> with following parameters: >> >> group: kvm >> >> config: x86_64-rhel-9.4-kselftests >> compiler: gcc-12 >> test machine: 224 threads 4 sockets Intel(R) Xeon(R) Platinum 8380H CPU @ 2.90GHz (Cooper Lake) with 192G memory > *sigh* > > This fails on our Skylake and Cascade Lake systems, but I only tested an Emerald > Rapids. > >> # Testing fixed counters, PMU version 0, perf_caps = 2000 >> # Testing arch events, PMU version 1, perf_caps = 0 >> # ==== Test Assertion Failure ==== >> # x86/pmu_counters_test.c:129: count >= (10 * 4 + 5) >> # pid=6278 tid=6278 errno=4 - Interrupted system call >> # 1 0x0000000000411281: assert_on_unhandled_exception at processor.c:625 >> # 2 0x00000000004075d4: _vcpu_run at kvm_util.c:1652 >> # 3 (inlined by) vcpu_run at kvm_util.c:1663 >> # 4 0x0000000000402c5e: run_vcpu at pmu_counters_test.c:62 >> # 5 0x0000000000402e4d: test_arch_events at pmu_counters_test.c:315 >> # 6 0x0000000000402663: test_arch_events at pmu_counters_test.c:304 >> # 7 (inlined by) test_intel_counters at pmu_counters_test.c:609 >> # 8 (inlined by) main at pmu_counters_test.c:642 >> # 9 0x00007f3b134f9249: ?? ??:0 >> # 10 0x00007f3b134f9304: ?? ??:0 >> # 11 0x0000000000402900: _start at ??:? >> # count >= NUM_INSNS_RETIRED > The failure is on top-down slots. I modified the assert to actually print the > count (I'll make sure to post a patch regardless of where this goes), and based > on the count for failing vs. passing, I'm pretty sure the issue is not the extra > instruction, but instead is due to changing the target of the CLFUSH from the > address of the code to the address of kvm_pmu_version. > > However, I think the blame lies with the assertion itself, i.e. with commit > 4a447b135e45 ("KVM: selftests: Test top-down slots event in x86's pmu_counters_test"). > Either that or top-down slots is broken on the Lakes. > > By my rudimentary measurements, tying the number of available slots to the number > of instructions *retired* is fundamentally flawed. E.g. on the Lakes (SKX is more > or less identical to CLX), omitting the CLFLUSHOPT entirely results in *more* > slots being available throughout the lifetime of the measured section. > > My best guess is that flushing the cache line use for the data load causes the > backend to saturate its slots with prefetching data, and as a result the number > of slots that are available goes down. > > CLFLUSHOPT . | CLFLUSHOPT [%m] | NOP > CLX 350-100 | 20-60[*] | 135-150 > SPR 49000-57000 | 32500-41000 | 6760-6830 > > [*] CLX had a few outliers in the 200-400 range, but the majority of runs were > in the 20-60 range. > > Reading through more (and more and more) of the TMA documentation, I don't think > we can assume anything about the number of available slots, beyond a very basic > assertion that it's practically impossible for there to never be an available > slot. IIUC, retiring an instruction does NOT require an available slot, rather > it requires the opposite: an occupied slot for the uop(s). I'm not quite sure about this. IIUC, retiring an instruction may not need a cycle, but it needs a slot at least except the instruction is macro-fused. Anyway, let me double check with our micro-architecture and perf experts. > > I'm mildly curious as to why the counts for SPR are orders of magnitude higher > that CLX (simple accounting differences?), but I don't think it changes anything > in the test itself. > > Unless someone has a better idea, my plan is to post a patch to assert that the > top-down slots count is non-zero, not that it's >= instructions retired. E.g. > > diff --git a/tools/testing/selftests/kvm/x86/pmu_counters_test.c b/tools/testing/selftests/kvm/x86/pmu_counters_test.c > index accd7ecd3e5f..21acedcd46cd 100644 > --- a/tools/testing/selftests/kvm/x86/pmu_counters_test.c > +++ b/tools/testing/selftests/kvm/x86/pmu_counters_test.c > @@ -123,10 +123,8 @@ static void guest_assert_event_count(uint8_t idx, > fallthrough; > case INTEL_ARCH_CPU_CYCLES_INDEX: > case INTEL_ARCH_REFERENCE_CYCLES_INDEX: > - GUEST_ASSERT_NE(count, 0); > - break; > case INTEL_ARCH_TOPDOWN_SLOTS_INDEX: > - GUEST_ASSERT(count >= NUM_INSNS_RETIRED); > + GUEST_ASSERT_NE(count, 0); > break; > default: > break; >