On Tue, Dec 18, 2018 at 10:01 AM Sean Christopherson <sean.j.christopherson@xxxxxxxxx> wrote: > > On Tue, Dec 18, 2018 at 09:38:27AM -0800, Jim Mattson wrote: > > On Tue, Dec 18, 2018 at 7:04 AM Sean Christopherson > > <sean.j.christopherson@xxxxxxxxx> wrote: > > > > > > On Mon, Dec 17, 2018 at 03:14:14PM -0800, Jim Mattson wrote: > > > > The virtual VMX preemption timer doesn't behave correctly when the > > > > VMCS12 VMX-preemption timer value field is 0 and there is an injected > > > > event in the VMCS12. The event should be vectored through the guest > > > > IDT before the "VMX-preemption timer expired" VM-exit from L2 to L1 is > > > > synthesized by L0, but it is not. Similarly, the virtual VMX > > > > preemption timer doesn't behave correctly when the VMCS12 > > > > VMX-preemption timer value field is 0 and there are pending debug > > > > exceptions in the VMCS12. The pending debug exceptions should be > > > > delivered before the "VMX-preemption timer expired" VM-exit from L2 to > > > > L1 is synthesized by L0, but they are not. > > > > > > > > The easiest way to fix this is to use the VMX-preemption timer in > > > > VMCS02 whenever the VMCS12 VMX-preemption timer value field is 0. > > > > Multiplexing with the existing usage of the VMCS02 VMX-preemption > > > > timer is straightforward. However, this approach introduces a > > > > dependency on the underlying hardware having VMX-preemption timer > > > > support. (Even broken VMX-preemption timer support should be > > > > sufficient. I know of no VMX preemption-timer errata that would impact > > > > the case where the VMX-preemption timer value field is 0.) > > > > Unfortunately, commit f4124500c2c13 ("KVM: nVMX: Fully emulate > > > > preemption timer") removed the dependency of the virtual > > > > VMX-preemption timer on a hardware VMX-preemption timer. > > > > > > > > I see at least the following three options: > > > > 1) Require a hardware VMX-preemption timer before advertising a > > > > virtual VMX-preemption timer. > > > > 2) Only provide a working virtual VMX-preemption timer when there is a > > > > hardware VMX-preemption timer, but continue to advertise the broken > > > > VMX-preemption timer on platforms that don't support a hardware > > > > VMX-preemption timer. > > > > 3) Teach kvm how to do guest IDT-vectoring in software, so that a > > > > hardware VMX-preemption timer isn't necessary. > > > > > > > > Thoughts? Other options? > > > > > > 4) Move the exception handling out of vmx_check_nested_events() and into > > > a separate function, and reorder the flow of inject_pending_event() > > > to prioritize VOE. kvm_vcpu_running() also uses .check_nested_events(), > > > not sure what needs to be done there. > > > > Unless I'm missing something, this reorganization seems orthogonal to > > (1) or (2). That is, even if we fix the code that was causing us to > > bypass the launch of vmcs02, how do we get a VM-exit after the event > > injection if we don't set up a zero-valued VMX-preemption timer in > > vmcs02? > > inject_pending_event should do VOE injection AND return -EBUSY to request > an immediate exit, e.g. vmx_check_nested_events() should take into account > the fact that we just injected a VOE, i.e. set block_nested_events. > request_immediate_exit() will use the preemption timer when possible, so > it should "just work". > > Hardware without a preemption timer should also work. Even though commit > d264ee0c2ed2 ("KVM: VMX: use preemption timer to force immediate VMExit") > correctly states that using a self-IPI to request an immediate exit is > wrong, it's only really wrong in theory. In practice the IPI will arrive > as soon as the VOE is vectored in the guest (the unit test was failing > because there was also a bug in KVM's nested INTR handling). Makes sense. (But what's VOE?)