On Mon, Jun 27, 2022 at 8:54 PM Nadav Amit <nadav.amit@xxxxxxxxx> wrote: > The failure on bare-metal that I experienced hints that this is either a test > bug or (much less likely) a hardware bug. But I do not think it is likely to be > a KVM bug. KVM does not use the VMX-preemption timer to virtualize L1's VMX-preemption timer (and that is why KVM is broken). The KVM bug was introduced with commit f4124500c2c1 ("KVM: nVMX: Fully emulate preemption timer"), which uses an L0 CLOCK_MONOTONIC hrtimer to emulate L1's VMX-preemption timer. There are many reasons that this cannot possibly work, not the least of which is that the CLOCK_MONOTONIC timer is subject to time slew. Currently, KVM reserves L0's VMX-preemption timer for emulating L1's APIC timer. Better would be to determine whether L1's APIC timer or L1's VMX-preemption timer is scheduled to fire first, and use L0's VMX-preemption timer to trigger a VM-exit on the nearest alarm. Alternatively, as Sean noted, one could perhaps arrange for the hrtimer to fire early enough that it won't fire late, but I don't really think that's a viable solution. I can't explain the bare-metal failures, but I will note that the test assumes the default treatment of SMIs and SMM. The test will likely fail with the dual-monitor treatment of SMIs and SMM. Aside from the older CPUs with broken VMX-preemption timers, I don't know of any relevant errata. Of course, it is possible that the test itself is buggy. For the person who reported bare-metal failures on Ice Lake and Cooper Lake, how long was the test in VMX non-root mode past the VMX-preemption timer deadline?