Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

Abel Gordon <ABELG@xxxxxxxxxx> · Sun, 25 Aug 2013 11:18:42 +0300

kvm-owner@xxxxxxxxxxxxxxx wrote on 25/08/2013 10:54:13 AM:

> From: Jan Kiszka <jan.kiszka@xxxxxx>
> To: Abel Gordon/Haifa/IBM@IBMIL,
> Cc: gleb@xxxxxxxxxx, kvm <kvm@xxxxxxxxxxxxxxx>, pbonzini@xxxxxxxxxx,
> "李春奇 <Arthur Chunqi Li>"  <yzt356@xxxxxxxxx>
> Date: 25/08/2013 10:54 AM
> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
timer
> Sent by: kvm-owner@xxxxxxxxxxxxxxx
>
> On 2013-08-25 09:50, Abel Gordon wrote:
> >
> >
> > kvm-owner@xxxxxxxxxxxxxxx wrote on 25/08/2013 10:43:12 AM:
> >
> >> From: Jan Kiszka <jan.kiszka@xxxxxx>
> >> To: Abel Gordon/Haifa/IBM@IBMIL,
> >> Cc: gleb@xxxxxxxxxx, kvm@xxxxxxxxxxxxxxx, kvm-owner@xxxxxxxxxxxxxxx,
> >> pbonzini@xxxxxxxxxx, "李春奇 <Arthur Chunqi Li>" <yzt356@xxxxxxxxx>
> >> Date: 25/08/2013 10:43 AM
> >> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
> > timer
> >> Sent by: kvm-owner@xxxxxxxxxxxxxxx
> >>
> >> On 2013-08-25 09:37, Abel Gordon wrote:
> >>>
> >>>
> >>>> From: Jan Kiszka <jan.kiszka@xxxxxx>
> >>>> To: "李春奇 <Arthur Chunqi Li>"  <yzt356@xxxxxxxxx>,
> >>>> Cc: kvm@xxxxxxxxxxxxxxx, gleb@xxxxxxxxxx, pbonzini@xxxxxxxxxx
> >>>> Date: 25/08/2013 09:44 AM
> >>>> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX
preemption
> >>> timer
> >>>> Sent by: kvm-owner@xxxxxxxxxxxxxxx
> >>>>
> >>>> On 2013-08-24 20:44, root wrote:
> >>>>> This patch contains the following two changes:
> >>>>> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0
> >>>>> with some reasons not emulated by L1, preemption timer value should
> >>>>> be save in such exits.
> >>>>> 2. Add support of "Save VMX-preemption timer value" VM-Exit
controls
> >>>>> to nVMX.
> >>>>>
> >>>>> With this patch, nested VMX preemption timer features are fully
> >>>>> supported.
> >>>>>
> >>>>> Signed-off-by: Arthur Chunqi Li <yzt356@xxxxxxxxx>
> >>>>> ---
> >>>
> >>>>>
> >>>>> @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu
> >>>> *vcpu, struct vmcs12 *vmcs12)
> >>>>>        (vmcs_config.pin_based_exec_ctrl |
> >>>>>         vmcs12->pin_based_vm_exec_control));
> >>>>>
> >>>>> -   if (vmcs12->pin_based_vm_exec_control &
> >>> PIN_BASED_VMX_PREEMPTION_TIMER)
> >>>>> -      vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
> >>>>> -              vmcs12->vmx_preemption_timer_value);
> >>>>> +   if (vmcs12->pin_based_vm_exec_control &
> >>>> PIN_BASED_VMX_PREEMPTION_TIMER) {
> >>>>> +      if (vmcs12->vm_exit_controls &
> >>> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
> >>>>> +         vmcs12->vmx_preemption_timer_value =
> >>>>> +            vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
> >>>>> +      else
> >>>>> +         vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
> >>>>> +               vmcs12->vmx_preemption_timer_value);
> >>>>> +   }
> >>>>
> >>>> This is not correct. We still need to set the vmcs to
> >>>> vmx_preemption_timer_value. The difference is that, on exit from L2,
> >>>> vmx_preemption_timer_value has to be updated according to the saved
> >>>> hardware state. The corresponding code is missing in your patch so
> > far.
> >>>
> >>> I think something else maybe be missing here: assuming L0 handles
exits
> >>> for L2 without involving L1 (e.g. external interrupts or ept
> > violations),
> >>> then, we may spend some cycles in L0 handling these exits. Note L1 is
> > not
> >>> aware of these exits and from L1 perspective L2 was running on the
CPU.
> >>> That means that we may need to reduce these cycles spent at
> >>> L0 from the preemtion timer or emulate a preemption timer exit to
> >>> force a transition to L1 instead of resuming L2.
> >>
> >> That's precisely what the logic I described should achieve: reload the
> >> value we saved on L2 exit on reentry.
> >
> > But don't you think we should also reduce the cycles spent at L0 from
the
> > preemption timer ? I mean, if we spent X cycles at L0 handling a L2
exit
> > which was not forwarded to L1, then, before we resume L2,
> > the preemption timer should be: (previous_value_on_exit - X).
> > If (previous_value_on_exit - X) < 0, then we should force ("emulate") a
> > preemption timer exit between L2 and L1.
>
> We ask the hardware to save the value of the preemption on L2 exit. This
> value will be exposed to L1 (if it asked for saving as well) and/or be
> written back to the hardware on L2 reenty (unless L1 had a chance to run
> and modified it). So the time spent in L0 is implicitly subtracted.

I think you are suggesting the following, please correct me if I am wrong.
1) L1 resumes L2 with preemption timer enabled
2) L0 emulates the resume/launch
3) L2 runs for Y cycles until an external interrupt occurs (Y < preemption
timer specified by L1)
4) L0 saved the preemption timer (original value - Y)
5) L0 spends X cycles handling the external interrupt
6) L0 resumes L2 with preemption timer = original value - Y

Note that in this case "X is ignored".

I was suggesting to do the following:
6) If original value - Y - X > 0 then
 L0 resumes L2 with preemption timer = original value - Y - X
else
 L0 emulates a L2->L1 preemption timer exit (resumes L1)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html