Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume

Orit Wasserman <oritw@xxxxxxxxxx> · Wed, 28 Oct 2009 18:23:42 +0200

Gleb Natapov <gleb@xxxxxxxxxx> wrote on 25/10/2009 11:44:31:

> From:
>
> Gleb Natapov <gleb@xxxxxxxxxx>
>
> To:
>
> Orit Wasserman/Haifa/IBM@IBMIL
>
> Cc:
>
> Abel Gordon/Haifa/IBM@IBMIL, aliguori@xxxxxxxxxx, Ben-Ami Yassour1/
> Haifa/IBM@IBMIL, kvm@xxxxxxxxxxxxxxx, mdday@xxxxxxxxxx, Muli Ben-
> Yehuda/Haifa/IBM@IBMIL
>
> Date:
>
> 25/10/2009 11:44
>
> Subject:
>
> Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume
>
> On Thu, Oct 22, 2009 at 05:46:16PM +0200, Orit Wasserman wrote:
> >
> >
> > Gleb Natapov <gleb@xxxxxxxxxx> wrote on 22/10/2009 11:04:58:
> >
> > > From:
> > >
> > > Gleb Natapov <gleb@xxxxxxxxxx>
> > >
> > > To:
> > >
> > > Orit Wasserman/Haifa/IBM@IBMIL
> > >
> > > Cc:
> > >
> > > Abel Gordon/Haifa/IBM@IBMIL, aliguori@xxxxxxxxxx, Ben-Ami Yassour1/
> > > Haifa/IBM@IBMIL, kvm@xxxxxxxxxxxxxxx, mdday@xxxxxxxxxx, Muli Ben-
> > > Yehuda/Haifa/IBM@IBMIL
> > >
> > > Date:
> > >
> > > 22/10/2009 11:05
> > >
> > > Subject:
> > >
> > > Re: [PATCH 5/5] Nested VMX patch 5 implements vmlaunch and vmresume
> > >
> > > On Wed, Oct 21, 2009 at 04:43:44PM +0200, Orit Wasserman wrote:
> > > > > > @@ -4641,10 +4955,13 @@ static void vmx_complete_interrupts
(struct
> > > > > vcpu_vmx *vmx)
> > > > > >     int type;
> > > > > >     bool idtv_info_valid;
> > > > > >
> > > > > > -   exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
> > > > > > -
> > > > > >     vmx->exit_reason = vmcs_read32(VM_EXIT_REASON);
> > > > > >
> > > > > > +   if (vmx->nested.nested_mode)
> > > > > > +      return;
> > > > > > +
> > > > > Why return here? What the function does that should not be done
in
> > > > > nested mode?
> > > > In nested mode L0 injects an interrupt to L2 only in one scenario,
> > > > if there is an IDT_VALID event and L0 decides to run L2 again and
not
> > to
> > > > switch back to L1.
> > > > In all other cases the injection is handled by L1.
> > > This is exactly the kind of scenario that is handled by
> > > vmx_complete_interrupts(). (vmx|svm)_complete_interrups() store
> > > pending event in arch agnostic way and re-injection is handled by
> > > x86.c You bypass this logic by inserting return here and introducing
> > > nested_handle_valid_idt() function below.
> > The only location we can truly know if we are switching to L1 is in
> > vmx_vcpu_run
> > because enable_irq_window (that is called after handling the exit) can
> > decide to
> > switch to L1 because of an interrupt.
> enable_irq_window() will be called after L2 VMCS will be setup for event
> re-injection by previous call to inject_pending_event(). As far as I
> can see this should work for interrupt injection. For exception we
> should probably require l2 guest to re execute faulted instruction for
> now like svm does.
The main issue is that L0 doesn't inject events to L2 but L1 hypervisor (we
want to keep the nested hypervisor semantics as
much as possible). Only if the event was caused by the fact that L2 is a
nested guest
and L1 can't handle it L0 will re-inject and event to L2, for example IDT
event
with page fault that is caused by a missing entry in SPT02 (the shadow page
table L0 create for L2).
In this case when vmx_complete_intterupts is called L0 doesn't know if the
page fault should be handled by it or
by L1 (it is decided later when handling the exit).
In most other cases , L0 will switch to L1 and L1 will decide if there will
be re-injection
(depends on the L1 hypervisor logic) and update L2 VMCS accordingly.
>
> > In order to simplify our code it was simpler to bypass
> > vmx_complete_interrupts when it is called (after
> > running L2) and to add nested_handle_valid_idt just before running L2.
> > > > >
> > > > > > +   exit_intr_info = vmcs_read32(VM_EXIT_INTR_INFO);
> > > > > > +
> > > > > >     /* Handle machine checks before interrupts are enabled */
> > > > > >     if ((vmx->exit_reason == EXIT_REASON_MCE_DURING_VMENTRY)
> > > > > >         || (vmx->exit_reason == EXIT_REASON_EXCEPTION_NMI
> > > > > > @@ -4747,6 +5064,60 @@ static void fixup_rmode_irq(struct
vcpu_vmx
> > > > *vmx)
> > > > > >        | vmx->rmode.irq.vector;
> > > > > >  }
> > > > > >
> > > > > > +static int nested_handle_valid_idt(struct kvm_vcpu *vcpu)
> > > > > > +{
> > > > > It seems by this function you are trying to bypass general event
> > > > > reinjection logic. Why?
> > > > See above.
> > > The logic implemented by this function is handled in x86.c in arch
> > > agnostic way. Is there something wrong with this?
> > See my comment before
> Sometimes it is wrong to reinject events from L0 to L2 directly. If L2
> was not able to handle event because its IDT is not mapped by L1 shadow
> page table we should generate PF vmexit with valid idt vectoring info to
> L1 and let L1 handle event reinjection.
>
> --
>          Gleb.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html