On Mon, 2018-10-01 at 19:06 +0200, Paolo Bonzini wrote: > On 26/09/2018 11:41, Nikita Leshenko wrote: > > > > > > > > On 24 Sep 2018, at 17:21, Paolo Bonzini <pbonzini@xxxxxxxxxx> wrote: > > > > > > On 23/09/2018 16:19, Nikita Leshenko wrote: > > > > > > > > I tried to reproduce the failure - I ran both linked tests on Linux master > > > > (a83f87c1d2a93) but they pass. > > > > > > > > Can you post your failure log? > > > > > > > > Upstream kernel on an Ubuntu 18.04 with QEMU 2.11.1. The tests I ran: > > > > - vmx_pending_event_test > > > > - vmx_pending_event_hlt_test > > > My failure log is the following: > > > > > > Test suite: vmx_pending_event_test > > > FAIL: x86/vmx_tests.c:2133: Assertion failed: (expected) == (actual) > > > LHS: 0x0000000000000001 - 0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0001 - 1 > > > RHS: 0x0000000000000012 - 0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0000'0001'0010 - 18 > > > Expected VMX_EXTINT, got VMX_VMCALL. > > > STACK: 40579c 405919 4059a3 401d4e 4038ee 400312 > > > SUMMARY: 7 tests, 1 unexpected failures > > > > > > This is on a Haswell-EP (Xeon E5 v3) machine. I can try tomorrow on a Kaby Lake > > > laptop too (I'm not very well equipped to do kernel development there so I'd > > > rather wait for Fedora's 4.19-rc5 kernel to be available). > > Could the failure be related to commit d264ee0c2 (KVM: VMX: use preemption timer > > to force immediate VMExit)? > > > > Given the multiple erratas that exist on VMX preemption timer and because this > > test requires immediate exit, I think it's worth doing a checkout of commit > > b5861e5cf (KVM: nVMX: Fix loss of pending IRQ/NMI before entering L2) directly > > (before the preemption timer changes are present) and running the tests again. > > They still pass on my Skylake-SP (Xeon Platinum 8167M) and I wonder if the > > results on your CPU will be different. > Yeah, they pass on Kaby Lake too (Core i7-7600U) so I think we should > re-enable smp_send_reschedule on pre-Skylake processor. Sean, what do > you think? That's not good. The errata I'm aware of relate to the timer counting at the wrong frequency. A timer value of zero is (supposed to be) special cased by the VMEnter ucode and shouldn't be subject to the known errata. In other words, this would be a new bug/errata if the timer is armed with a value of zero and you're not seeing a VMExit. Either that or the existing errata is *really* poorly worded. What happens if you change the test to loop indefinitely instead of doing VMCALL? Do you ever see a timer VMExit or does it hang? > (And, thanks for the nice testcase finding bugs in other patches too. :)) > > Paolo