Re: vmx_pmu_caps_test fails on Skylake based CPUS due to read only LBRs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jan 24, 2025, Maxim Levitsky wrote:
> On Wed, 2025-01-22 at 13:02 -0800, Sean Christopherson wrote:
> > > > > Note that I read all msrs using 'rdmsr' userspace tool.
> > > > 
> > > > I'm pretty sure debugging via 'rdmsr', i.e. /dev/msr, isn't going to work.  I
> > > > assume perf is clobbering LBR MSRs on context switch, but I haven't tracked that
> > > > down to confirm (the code I see on inspecition is gated on at least one perf
> > > > event using LBRs).  My guess is that there's a software bug somewhere in the
> > > > perf/KVM exchange.
> > > > 
> > > > I confirmed that using 'rdmsr' and 'wrmsr' "loses" values, but that hacking KVM
> > > > to read/write all LBRs during initialization works with LBRs disabled.
> 
> Hi!
> 
> I finally got to the very bottom of this:
> 
> First of all, your assumption that the kernel resets LBR related msrs on
> context switch after 'wrmsr' program finishes execution is wrong, because the
> kernel will only do this if it *itself* enables the LBR feature (that is when
> something like 'perf', uses a perf counter with a lbr call stack).
> 
> Writes that 'wrmsr' tool does are not something that kernel expects so it
> doesn't do anything in this case.
> 
> What is happening instead, is something completely different: Turns out that
> to shave off something like 50 nanoseconds, off the deep C-state entry/exit
> latency, some Intel CPU don't preserve LBR stack values over these C-state
> entries.

Ugh.

> > Ugh, but it does.  On writes to any LBR, including LBR_TOS, KVM creates a "virtual"
> > LBR perf event.  KVM then relies on perf to context switch LBR MSRs, i.e. relies
> > on perf to load the guest's values into hardware.  At least, I think that's what
> > is supposed to happen.  AFAIK, the perf-based LBR support has never been properly
> > document[*].
> > 
> > Anyways, my understanding of intel_pmu_handle_lbr_msrs_access() is that if the
> > vCPU's LBR perf event is scheduled out or can't be created, the guest's value is
> > effectively lost.  Again, I don't know the "rules" for the LBR perf event, but
> > it wouldn't suprise me if your CI fails because something in the host conflicts
> > with KVM's LBR perf event.
> 
> Actually you are partially wrong here too (although BIOS can be considered
> 'something on the host').
> 
> I was able to prove that the reason why the unit test fails *is* because BIOS
> left LBRs enabled:
> 
> First of all, setting LBR bit manually in DEBUG_CTL does trigger this bug 
> (I use a different machine now, which doesn't have the bios bug):

...

> ==== Test Assertion Failure ====
>   x86_64/vmx_pmu_caps_test.c:202: r == v
>   pid=8415 tid=8415 errno=0 - Success
>      1	0x0000000000404301: __suite_lbr_perf_capabilities at vmx_pmu_caps_test.c:202
>      2	 (inlined by) vmx_pmu_caps_lbr_perf_capabilities at vmx_pmu_caps_test.c:194
>      3	 (inlined by) wrapper_vmx_pmu_caps_lbr_perf_capabilities at vmx_pmu_caps_test.c:194
>      4	0x000000000040511a: __run_test at kselftest_harness.h:1240
>      5	0x0000000000402b95: test_harness_run at kselftest_harness.h:1310
>      6	 (inlined by) main at vmx_pmu_caps_test.c:246
>      7	0x00007f56ba2295cf: ?? ??:0
>      8	0x00007f56ba22967f: ?? ??:0
>      9	0x0000000000402e44: _start at ??:?
>   Set MSR_LBR_TOS to '0x7', got back '0xc'
> # lbr_perf_capabilities: Test failed
> #          FAIL  vmx_pmu_caps.lbr_perf_capabilities
> not ok 5 vmx_pmu_caps.lbr_perf_capabilities
> #  RUN           vmx_pmu_caps.perf_capabilities_unsupported ...
> #            OK  vmx_pmu_caps.perf_capabilities_unsupported
> ok 6 vmx_pmu_caps.perf_capabilities_unsupported
> # FAILED: 5 / 6 tests passed.
> # Totals: pass:5 fail:1 xfail:0 xpass:0 skip:0 error:0
> 
> Secondary I went over all places in the kernel and all of them take care to
> preserve DEBUG_CTL and only set/clear specific bits.
> 
> __intel_pmu_lbr_enable() and __intel_pmu_lbr_enable() are practically the
> only two places where DEBUGCTLMSR_LBR bit is touched, and the test doesn't
> trigger them. Most likely because the test uses special
> 'INTEL_FIXED_VLBR_EVENT' perf event (see intel_pmu_create_guest_lbr_event)
> which is not enabled while in host mode.
> 
> To double check this I traced all writes to DEBUG_CTL msr during this test
> and the only write is done during 'guest_wrmsr_perf_capabilities' subtest, by
> vmx_vcpu_run() which just restores the value that the msr had prior to VM
> entry.
> 
> So, why the value that BIOS sets survives? Because as I said all code that
> touches DEBUG_CTL takes care to preserve all bits but the bit which is
> changed, LBRs are never enabled on the host, and even the guest entry
> preserves host DEBUG_CTL.  Therefore the value written by BIOS survives.

Well that's rather insane.

> So we end up with the test writing to LBR_TOS while LBRs are unexpectedly
> enabled, so it's not a surprise that when the test reads back the value
> written, it will differ, and the test will rightfully fail.
> 
> Since we have seen this in CI, and you saw it too in your CI,

Gah, that was bad reporting on my end.  The failure we saw was something else
entirely.

> I think this BIOS bug is not that rare, and so I suggest to stick
> 'wrmsrl(MSR_IA32_DEBUGCTLMSR, 0)' somewhere early in a kernel boot code or at
> least clear the DEBUGCTLMSR_LBR bit.
> 
> I haven't found a very good place to put this, in a way that I can be sure
> that x86 maintainers won't reject it, so I am open to your suggestions.

Compile tested only, but perf's CPU online path seems appropriate, especially
since that path also explicitly clears LBRs.  Ensuring LBRs are stopped before
clearing them seems logical.

diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index 99c590da0ae2..6e898b832d75 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -5030,8 +5030,12 @@ static void intel_pmu_cpu_starting(int cpu)
 
        init_debug_store_on_cpu(cpu);
        /*
-        * Deal with CPUs that don't clear their LBRs on power-up.
+        * Deal with CPUs that don't clear their LBRs on power-up, and with
+        * BIOSes that leave LBRs enabled.
         */
+       if (!static_cpu_has(X86_FEATURE_ARCH_LBR) && x86_pmu.lbr_nr)
+               msr_clear_bit(MSR_IA32_DEBUGCTLMSR, DEBUGCTLMSR_LBR_BIT);
+
        intel_pmu_lbr_reset();
 
        cpuc->lbr_sel = NULL;
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 3ae84c3b8e6d..bb7dd85aa6f2 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -395,7 +395,8 @@
 #define MSR_IA32_PASID_VALID           BIT_ULL(31)
 
 /* DEBUGCTLMSR bits (others vary by model): */
-#define DEBUGCTLMSR_LBR                        (1UL <<  0) /* last branch recording */
+#define DEBUGCTLMSR_LBR_BIT            0
+#define DEBUGCTLMSR_LBR                        (1UL <<  DEBUGCTLMSR_LBR_BIT) /* last branch recording */
 #define DEBUGCTLMSR_BTF_SHIFT          1
 #define DEBUGCTLMSR_BTF                        (1UL <<  1) /* single-step on branches */
 #define DEBUGCTLMSR_BUS_LOCK_DETECT    (1UL <<  2)





[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux