Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

Abel Gordon <ABELG@xxxxxxxxxx> · Sun, 25 Aug 2013 11:39:30 +0300

Jan Kiszka <jan.kiszka@xxxxxx> wrote on 25/08/2013 11:27:22 AM:

> From: Jan Kiszka <jan.kiszka@xxxxxx>
> To: Abel Gordon/Haifa/IBM@IBMIL,
> Cc: gleb@xxxxxxxxxx, kvm <kvm@xxxxxxxxxxxxxxx>, pbonzini@xxxxxxxxxx,
> "李春奇 <Arthur Chunqi Li>" <yzt356@xxxxxxxxx>
> Date: 25/08/2013 11:27 AM
> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
timer
>
> On 2013-08-25 10:25, Jan Kiszka wrote:
> > On 2013-08-25 10:18, Abel Gordon wrote:
> >>
> >>
> >> kvm-owner@xxxxxxxxxxxxxxx wrote on 25/08/2013 10:54:13 AM:
> >>
> >>> From: Jan Kiszka <jan.kiszka@xxxxxx>
> >>> To: Abel Gordon/Haifa/IBM@IBMIL,
> >>> Cc: gleb@xxxxxxxxxx, kvm <kvm@xxxxxxxxxxxxxxx>, pbonzini@xxxxxxxxxx,
> >>> "李春奇 <Arthur Chunqi Li>"  <yzt356@xxxxxxxxx>
> >>> Date: 25/08/2013 10:54 AM
> >>> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX
preemption
> >> timer
> >>> Sent by: kvm-owner@xxxxxxxxxxxxxxx
> >>>
> >>> On 2013-08-25 09:50, Abel Gordon wrote:
> >>>>
> >>>>
> >>>> kvm-owner@xxxxxxxxxxxxxxx wrote on 25/08/2013 10:43:12 AM:
> >>>>
> >>>>> From: Jan Kiszka <jan.kiszka@xxxxxx>
> >>>>> To: Abel Gordon/Haifa/IBM@IBMIL,
> >>>>> Cc: gleb@xxxxxxxxxx, kvm@xxxxxxxxxxxxxxx,
kvm-owner@xxxxxxxxxxxxxxx,
> >>>>> pbonzini@xxxxxxxxxx, "李春奇 <Arthur Chunqi Li>" <yzt356@xxxxxxxxx>
> >>>>> Date: 25/08/2013 10:43 AM
> >>>>> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX
preemption
> >>>> timer
> >>>>> Sent by: kvm-owner@xxxxxxxxxxxxxxx
> >>>>>
> >>>>> On 2013-08-25 09:37, Abel Gordon wrote:
> >>>>>>
> >>>>>>
> >>>>>>> From: Jan Kiszka <jan.kiszka@xxxxxx>
> >>>>>>> To: "李春奇 <Arthur Chunqi Li>"  <yzt356@xxxxxxxxx>,
> >>>>>>> Cc: kvm@xxxxxxxxxxxxxxx, gleb@xxxxxxxxxx, pbonzini@xxxxxxxxxx
> >>>>>>> Date: 25/08/2013 09:44 AM
> >>>>>>> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX
> >> preemption
> >>>>>> timer
> >>>>>>> Sent by: kvm-owner@xxxxxxxxxxxxxxx
> >>>>>>>
> >>>>>>> On 2013-08-24 20:44, root wrote:
> >>>>>>>> This patch contains the following two changes:
> >>>>>>>> 1. Fix the bug in nested preemption timer support. If vmexit
L2->L0
> >>>>>>>> with some reasons not emulated by L1, preemption timer value
should
> >>>>>>>> be save in such exits.
> >>>>>>>> 2. Add support of "Save VMX-preemption timer value" VM-Exit
> >> controls
> >>>>>>>> to nVMX.
> >>>>>>>>
> >>>>>>>> With this patch, nested VMX preemption timer features are fully
> >>>>>>>> supported.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Arthur Chunqi Li <yzt356@xxxxxxxxx>
> >>>>>>>> ---
> >>>>>>
> >>>>>>>>
> >>>>>>>> @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct
kvm_vcpu
> >>>>>>> *vcpu, struct vmcs12 *vmcs12)
> >>>>>>>>        (vmcs_config.pin_based_exec_ctrl |
> >>>>>>>>         vmcs12->pin_based_vm_exec_control));
> >>>>>>>>
> >>>>>>>> -   if (vmcs12->pin_based_vm_exec_control &
> >>>>>> PIN_BASED_VMX_PREEMPTION_TIMER)
> >>>>>>>> -      vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
> >>>>>>>> -              vmcs12->vmx_preemption_timer_value);
> >>>>>>>> +   if (vmcs12->pin_based_vm_exec_control &
> >>>>>>> PIN_BASED_VMX_PREEMPTION_TIMER) {
> >>>>>>>> +      if (vmcs12->vm_exit_controls &
> >>>>>> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
> >>>>>>>> +         vmcs12->vmx_preemption_timer_value =
> >>>>>>>> +            vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
> >>>>>>>> +      else
> >>>>>>>> +         vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
> >>>>>>>> +               vmcs12->vmx_preemption_timer_value);
> >>>>>>>> +   }
> >>>>>>>
> >>>>>>> This is not correct. We still need to set the vmcs to
> >>>>>>> vmx_preemption_timer_value. The difference is that, on exit from
L2,
> >>>>>>> vmx_preemption_timer_value has to be updated according to the
saved
> >>>>>>> hardware state. The corresponding code is missing in your patch
so
> >>>> far.
> >>>>>>
> >>>>>> I think something else maybe be missing here: assuming L0 handles
> >> exits
> >>>>>> for L2 without involving L1 (e.g. external interrupts or ept
> >>>> violations),
> >>>>>> then, we may spend some cycles in L0 handling these exits. Note L1
is
> >>>> not
> >>>>>> aware of these exits and from L1 perspective L2 was running on the
> >> CPU.
> >>>>>> That means that we may need to reduce these cycles spent at
> >>>>>> L0 from the preemtion timer or emulate a preemption timer exit to
> >>>>>> force a transition to L1 instead of resuming L2.
> >>>>>
> >>>>> That's precisely what the logic I described should achieve: reload
the
> >>>>> value we saved on L2 exit on reentry.
> >>>>
> >>>> But don't you think we should also reduce the cycles spent at L0
from
> >> the
> >>>> preemption timer ? I mean, if we spent X cycles at L0 handling a L2
> >> exit
> >>>> which was not forwarded to L1, then, before we resume L2,
> >>>> the preemption timer should be: (previous_value_on_exit - X).
> >>>> If (previous_value_on_exit - X) < 0, then we should force
("emulate") a
> >>>> preemption timer exit between L2 and L1.
> >>>
> >>> We ask the hardware to save the value of the preemption on L2 exit.
This
> >>> value will be exposed to L1 (if it asked for saving as well) and/or
be
> >>> written back to the hardware on L2 reenty (unless L1 had a chance to
run
> >>> and modified it). So the time spent in L0 is implicitly subtracted.
> >>
> >> I think you are suggesting the following, please correct me if I am
wrong.
> >> 1) L1 resumes L2 with preemption timer enabled
> >> 2) L0 emulates the resume/launch
> >> 3) L2 runs for Y cycles until an external interrupt occurs (Y <
preemption
> >> timer specified by L1)
> >> 4) L0 saved the preemption timer (original value - Y)
> >> 5) L0 spends X cycles handling the external interrupt
> >> 6) L0 resumes L2 with preemption timer = original value - Y
> >>
> >> Note that in this case "X is ignored".
> >
> > Yes, but see my other reply.

I sent my reply before I read yours, sorry.
Anyway, we are now on the same page ;)

> >> I was suggesting to do the following:
> >> 6) If original value - Y - X > 0 then
> >>  L0 resumes L2 with preemption timer = original value - Y - X
> >> else
> >>  L0 emulates a L2->L1 preemption timer exit (resumes L1)
> >
> > Almost . 6) should be:
> > If exit to L1 occurred after last L2, set X to 0. Then load MAX
(original
> > value - Y - X, 0).
>
> Hmm, no:
>
> If exit to L1 occurred after last L2, load value of vmcs12, else load
> MAX(original
> value - Y - X, 0).

Note you are resuming L2 to force an immediate exit. I agree this approach
will be easier and cleaner to implement/maintain but it could force one
more
exit and entry.  Anyway, any approach is welcome as long as it considers
the
cycles spent at L0 ("X") as we previously discussed and agreed.

Regards,
Abel.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html