Re: RFC: Fixing the broken virtual VMX-preemption timer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Dec 18, 2018 at 10:22:10AM -0800, Jim Mattson wrote:
> On Tue, Dec 18, 2018 at 10:01 AM Sean Christopherson
> <sean.j.christopherson@xxxxxxxxx> wrote:
> >
> > On Tue, Dec 18, 2018 at 09:38:27AM -0800, Jim Mattson wrote:
> > > On Tue, Dec 18, 2018 at 7:04 AM Sean Christopherson
> > > <sean.j.christopherson@xxxxxxxxx> wrote:
> > > >
> > > > On Mon, Dec 17, 2018 at 03:14:14PM -0800, Jim Mattson wrote:
> > > > > The virtual VMX preemption timer doesn't behave correctly when the
> > > > > VMCS12 VMX-preemption timer value field is 0 and there is an injected
> > > > > event in the VMCS12. The event should be vectored through the guest
> > > > > IDT before the "VMX-preemption timer expired" VM-exit from L2 to L1 is
> > > > > synthesized by L0, but it is not. Similarly, the virtual VMX
> > > > > preemption timer doesn't behave correctly when the VMCS12
> > > > > VMX-preemption timer value field is 0 and there are pending debug
> > > > > exceptions in the VMCS12. The pending debug exceptions should be
> > > > > delivered before the "VMX-preemption timer expired" VM-exit from L2 to
> > > > > L1 is synthesized by L0, but they are not.
> > > > >
> > > > > The easiest way to fix this is to use the VMX-preemption timer in
> > > > > VMCS02 whenever the VMCS12 VMX-preemption timer value field is 0.
> > > > > Multiplexing with the existing usage of the VMCS02 VMX-preemption
> > > > > timer is straightforward. However, this approach introduces a
> > > > > dependency on the underlying hardware having VMX-preemption timer
> > > > > support. (Even broken VMX-preemption timer support should be
> > > > > sufficient. I know of no VMX preemption-timer errata that would impact
> > > > > the case where the VMX-preemption timer value field is 0.)
> > > > > Unfortunately, commit f4124500c2c13 ("KVM: nVMX: Fully emulate
> > > > > preemption timer") removed the dependency of the virtual
> > > > > VMX-preemption timer on a hardware VMX-preemption timer.
> > > > >
> > > > > I see at least the following three options:
> > > > > 1) Require a hardware VMX-preemption timer before advertising a
> > > > > virtual VMX-preemption timer.
> > > > > 2) Only provide a working virtual VMX-preemption timer when there is a
> > > > > hardware VMX-preemption timer, but continue to advertise the broken
> > > > > VMX-preemption timer on platforms that don't support a hardware
> > > > > VMX-preemption timer.
> > > > > 3) Teach kvm how to do guest IDT-vectoring in software, so that a
> > > > > hardware VMX-preemption timer isn't necessary.
> > > > >
> > > > > Thoughts? Other options?
> > > >
> > > > 4) Move the exception handling out of vmx_check_nested_events() and into
> > > >    a separate function, and reorder the flow of inject_pending_event()
> > > >    to prioritize VOE.  kvm_vcpu_running() also uses .check_nested_events(),
> > > >    not sure what needs to be done there.
> > >
> > > Unless I'm missing something, this reorganization seems orthogonal to
> > > (1) or (2). That is, even if we fix the code that was causing us to
> > > bypass the launch of vmcs02, how do we get a VM-exit after the event
> > > injection if we don't set up a zero-valued VMX-preemption timer in
> > > vmcs02?
> >
> > inject_pending_event should do VOE injection AND return -EBUSY to request
> > an immediate exit, e.g. vmx_check_nested_events() should take into account
> > the fact that we just injected a VOE, i.e. set block_nested_events.
> > request_immediate_exit() will use the preemption timer when possible, so
> > it should "just work".
> >
> > Hardware without a preemption timer should also work.  Even though commit
> > d264ee0c2ed2 ("KVM: VMX: use preemption timer to force immediate VMExit")
> > correctly states that using a self-IPI to request an immediate exit is
> > wrong, it's only really wrong in theory.  In practice the IPI will arrive
> > as soon as the VOE is vectored in the guest (the unit test was failing
> > because there was also a bug in KVM's nested INTR handling).
> 
> Makes sense. (But what's VOE?)

Argh, sorry.  Vector-on-entry, i.e. vectored-event injection via
VM_ENTRY_INTR_INFO_FIELD.  I (obviously) don't always remember
that my lexicon doesn't exactly align with the SDM.



[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux