[Bug 216177] kvm-unit-tests vmx has about 60% of failure chance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=216177

--- Comment #10 from Jim Mattson (jmattson@xxxxxxxxxx) ---
On Mon, Jun 27, 2022 at 11:32 PM <bugzilla-daemon@xxxxxxxxxx> wrote:
>
> https://bugzilla.kernel.org/show_bug.cgi?id=216177
>
> --- Comment #9 from Yang Lixiao (lixiao.yang@xxxxxxxxx) ---
> (In reply to Jim Mattson from comment #8)
> > On Mon, Jun 27, 2022 at 8:54 PM Nadav Amit <nadav.amit@xxxxxxxxx> wrote:
> >
> > > The failure on bare-metal that I experienced hints that this is either a
> > test
> > > bug or (much less likely) a hardware bug. But I do not think it is likely
> > to
> > > be
> > > a KVM bug.
> >
> > KVM does not use the VMX-preemption timer to virtualize L1's
> > VMX-preemption timer (and that is why KVM is broken). The KVM bug was
> > introduced with commit f4124500c2c1 ("KVM: nVMX: Fully emulate
> > preemption timer"), which uses an L0 CLOCK_MONOTONIC hrtimer to
> > emulate L1's VMX-preemption timer. There are many reasons that this
> > cannot possibly work, not the least of which is that the
> > CLOCK_MONOTONIC timer is subject to time slew.
> >
> > Currently, KVM reserves L0's VMX-preemption timer for emulating L1's
> > APIC timer. Better would be to determine whether L1's APIC timer or
> > L1's VMX-preemption timer is scheduled to fire first, and use L0's
> > VMX-preemption timer to trigger a VM-exit on the nearest alarm.
> > Alternatively, as Sean noted, one could perhaps arrange for the
> > hrtimer to fire early enough that it won't fire late, but I don't
> > really think that's a viable solution.
> >
> > I can't explain the bare-metal failures, but I will note that the test
> > assumes the default treatment of SMIs and SMM. The test will likely
> > fail with the dual-monitor treatment of SMIs and SMM. Aside from the
> > older CPUs with broken VMX-preemption timers, I don't know of any
> > relevant errata.
> >
> > Of course, it is possible that the test itself is buggy. For the
> > person who reported bare-metal failures on Ice Lake and Cooper Lake,
> > how long was the test in VMX non-root mode past the VMX-preemption
> > timer deadline?
>
> On the first Ice lake:
> Test suite: vmx_preemption_timer_expiry_test
> FAIL: Last stored guest TSC (28067103426) < TSC deadline (28067086048)
>
> On the second Ice lake:
> Test suite: vmx_preemption_timer_expiry_test
> FAIL: Last stored guest TSC (27014488614) < TSC deadline (27014469152)
>
> On Cooper lake:
> Test suite: vmx_preemption_timer_expiry_test
> FAIL: Last stored guest TSC (29030585690) < TSC deadline (29030565024)

Wow! Those are *huge* overruns. What is the value of MSR 0x9B on these hosts?

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux