Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume

Gleb Natapov <gleb@xxxxxxxxxx> · Thu, 29 Oct 2009 19:31:05 +0200



On Wed, Oct 28, 2009 at 06:23:42PM +0200, Orit Wasserman wrote:
> 
> 
> Gleb Natapov <gleb@xxxxxxxxxx> wrote on 25/10/2009 11:44:31:
> 
> > From:
> >
> > Gleb Natapov <gleb@xxxxxxxxxx>
> >
> > To:
> >
> > Orit Wasserman/Haifa/IBM@IBMIL
> >
> > Cc:
> >
> > Abel Gordon/Haifa/IBM@IBMIL, aliguori@xxxxxxxxxx, Ben-Ami Yassour1/
> > Haifa/IBM@IBMIL, kvm@xxxxxxxxxxxxxxx, mdday@xxxxxxxxxx, Muli Ben-
> > Yehuda/Haifa/IBM@IBMIL
> >
> > Date:
> >
> > 25/10/2009 11:44
> >
> > Subject:
> >
> > Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume
> >
> > On Thu, Oct 22, 2009 at 05:46:16PM +0200, Orit Wasserman wrote:
> > >
> > >
> > > Gleb Natapov <gleb@xxxxxxxxxx> wrote on 22/10/2009 11:04:58:
> > >
> > > > From:
> > > >
> > > > Gleb Natapov <gleb@xxxxxxxxxx>
> > > >
> > > > To:
> > > >
> > > > Orit Wasserman/Haifa/IBM@IBMIL
> > > >
> > > > Cc:
> > > >
> > > > Abel Gordon/Haifa/IBM@IBMIL, aliguori@xxxxxxxxxx, Ben-Ami Yassour1/
> > > > Haifa/IBM@IBMIL, kvm@xxxxxxxxxxxxxxx, mdday@xxxxxxxxxx, Muli Ben-
> > > > Yehuda/Haifa/IBM@IBMIL
> > > >
> > > > Date:
> > > >
> > > > 22/10/2009 11:05
> > > >
> > > > Subject:
> > > >
> > > > Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume
> > > >
> > > > On Wed, Oct 21, 2009 at 04:43:44PM +0200, Orit Wasserman wrote:
> > > > > > > @@ -4641,10 +4955,13 @@ static void vmx_complete_interrupts
> (struct
> > > > > > vcpu_vmx *vmx)
> > > > > > >     int type;
> > > > > > >     bool idtv_info_valid;
> > > > > > >
> > > > > > > -   exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
> > > > > > > -
> > > > > > >     vmx->exit_reason = vmcs_read32(VM_EXIT_REASON);
> > > > > > >
> > > > > > > +   if (vmx->nested.nested_mode)
> > > > > > > +      return;
> > > > > > > +
> > > > > > Why return here? What the function does that should not be done
> in
> > > > > > nested mode?
> > > > > In nested mode L0 injects an interrupt to L2 only in one scenario,
> > > > > if there is an IDT_VALID event and L0 decides to run L2 again and
> not
> > > to
> > > > > switch back to L1.
> > > > > In all other cases the injection is handled by L1.
> > > > This is exactly the kind of scenario that is handled by
> > > > vmx_complete_interrupts(). (vmx|svm)_complete_interrups() store
> > > > pending event in arch agnostic way and re-injection is handled by
> > > > x86.c You bypass this logic by inserting return here and introducing
> > > > nested_handle_valid_idt() function below.
> > > The only location we can truly know if we are switching to L1 is in
> > > vmx_vcpu_run
> > > because enable_irq_window (that is called after handling the exit) can
> > > decide to
> > > switch to L1 because of an interrupt.
> > enable_irq_window() will be called after L2 VMCS will be setup for event
> > re-injection by previous call to inject_pending_event(). As far as I
> > can see this should work for interrupt injection. For exception we
> > should probably require l2 guest to re execute faulted instruction for
> > now like svm does.
> The main issue is that L0 doesn't inject events to L2 but L1 hypervisor (we
> want to keep the nested hypervisor semantics as
> much as possible). Only if the event was caused by the fact that L2 is a
> nested guest
> and L1 can't handle it L0 will re-inject and event to L2, for example IDT
> event
> with page fault that is caused by a missing entry in SPT02 (the shadow page
> table L0 create for L2).
> In this case when vmx_complete_intterupts is called L0 doesn't know if the
> page fault should be handled by it or
> by L1 (it is decided later when handling the exit).
So what? When it will be decided that L2 exit is needed pending event
will be transfered into L2's idt_vectoring_info. Otherwise event will be
reinfected by usual mechanism. BTW I don't see where you current code
setup L2's idt_vectoring_info if it is decided that L1 should handle
event reinjection.

> In most other cases , L0 will switch to L1 and L1 will decide if there will
> be re-injection
> (depends on the L1 hypervisor logic) and update L2 VMCS accordingly.
> >
> > > In order to simplify our code it was simpler to bypass
> > > vmx_complete_interrupts when it is called (after
> > > running L2) and to add nested_handle_valid_idt just before running L2.
> > > > > >
> > > > > > > +   exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
> > > > > > > +
> > > > > > >     /* Handle machine checks before interrupts are enabled */
> > > > > > >     if ((vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY)
> > > > > > >         || (vmx->exit_reason == EXIT_REASON_EXCEPTION_NMI
> > > > > > > @@ -4747,6 +5064,60 @@ static void fixup_rmode_irq(struct
> vcpu_vmx
> > > > > *vmx)
> > > > > > >        | vmx->rmode.irq.vector;
> > > > > > >  }
> > > > > > >
> > > > > > > +static int nested_handle_valid_idt(struct kvm_vcpu *vcpu)
> > > > > > > +{
> > > > > > It seems by this function you are trying to bypass general event
> > > > > > reinjection logic. Why?
> > > > > See above.
> > > > The logic implemented by this function is handled in x86.c in arch
> > > > agnostic way. Is there something wrong with this?
> > > See my comment before
> > Sometimes it is wrong to reinject events from L0 to L2 directly. If L2
> > was not able to handle event because its IDT is not mapped by L1 shadow
> > page table we should generate PF vmexit with valid idt vectoring info to
> > L1 and let L1 handle event reinjection.
> >
> > --
> >          Gleb.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html