On Mon, Aug 27, 2018 at 03:21:12PM -0700, Sean Christopherson wrote: > A VMX preemption timer value of '0' is guaranteed to cause a VMExit > prior to the CPU executing any instructions in the guest. Use the > preemption timer (if it's supported) to trigger immediate VMExit > in place of the current method of sending a self-IPI. This ensures > that pending VMExit injection to L1 occurs prior to executing any > instructions in the guest (regardless of nesting level). > > When deferring VMExit injection, KVM generates an immediate VMExit > from the (possibly nested) guest by sending itself an IPI. Because > hardware interrupts are blocked prior to VMEnter and are unblocked > (in hardware) after VMEnter, this results in taking a VMExit(INTR) > before any guest instruction is executed. But, as this approach > relies on the IPI being received before VMEnter executes, it only > works as intended when KVM is running as L0. Because there are no > architectural guarantees regarding when IPIs are delivered, when > running nested the INTR may "arrive" long after L2 is running e.g. > L0 KVM doesn't force an immediate switch to L1 to deliver an INTR. Circling back to this patch now that we have the nested interrupt handling sorted out, knock on wood... I think the basic premise of this patch is valid, but the above line should be stricken from the commit message since KVM's behavior is a bug. > For the most part, this unintended delay is not an issue since the > events being injected to L1 also do not have architectural guarantees > regarding their timing. The notable exception is the VMX preemption > timer[1], which is architecturally guaranteed to cause a VMExit prior > to executing any instructions in the guest if the timer value is '0' > at VMEnter. Specifically, the delay in injecting the VMExit causes > the preemption timer KVM unit test to fail when run in a nested guest. > > Note: this approach is viable even on CPUs with a broken preemption > timer, as broken in this context only means the timer counts at the > wrong rate. There are no known errata affecting timer value of '0'. > > [1] I/O SMIs also have guarantees on when they arrive, but I have > no idea if/how those are emulated in KVM. > > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>