Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

Arthur Chunqi Li <yzt356@xxxxxxxxx> · Sun, 25 Aug 2013 16:41:46 +0800

On Sun, Aug 25, 2013 at 4:18 PM, Abel Gordon <ABELG@xxxxxxxxxx> wrote:
>
>
> kvm-owner@xxxxxxxxxxxxxxx wrote on 25/08/2013 10:54:13 AM:
>
>> From: Jan Kiszka <jan.kiszka@xxxxxx>
>> To: Abel Gordon/Haifa/IBM@IBMIL,
>> Cc: gleb@xxxxxxxxxx, kvm <kvm@xxxxxxxxxxxxxxx>, pbonzini@xxxxxxxxxx,
>> "李春奇 <Arthur Chunqi Li>"  <yzt356@xxxxxxxxx>
>> Date: 25/08/2013 10:54 AM
>> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
> timer
>> Sent by: kvm-owner@xxxxxxxxxxxxxxx
>>
>> On 2013-08-25 09:50, Abel Gordon wrote:
>> >
>> >
>> > kvm-owner@xxxxxxxxxxxxxxx wrote on 25/08/2013 10:43:12 AM:
>> >
>> >> From: Jan Kiszka <jan.kiszka@xxxxxx>
>> >> To: Abel Gordon/Haifa/IBM@IBMIL,
>> >> Cc: gleb@xxxxxxxxxx, kvm@xxxxxxxxxxxxxxx, kvm-owner@xxxxxxxxxxxxxxx,
>> >> pbonzini@xxxxxxxxxx, "李春奇 <Arthur Chunqi Li>" <yzt356@xxxxxxxxx>
>> >> Date: 25/08/2013 10:43 AM
>> >> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
>> > timer
>> >> Sent by: kvm-owner@xxxxxxxxxxxxxxx
>> >>
>> >> On 2013-08-25 09:37, Abel Gordon wrote:
>> >>>
>> >>>
>> >>>> From: Jan Kiszka <jan.kiszka@xxxxxx>
>> >>>> To: "李春奇 <Arthur Chunqi Li>"  <yzt356@xxxxxxxxx>,
>> >>>> Cc: kvm@xxxxxxxxxxxxxxx, gleb@xxxxxxxxxx, pbonzini@xxxxxxxxxx
>> >>>> Date: 25/08/2013 09:44 AM
>> >>>> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX
> preemption
>> >>> timer
>> >>>> Sent by: kvm-owner@xxxxxxxxxxxxxxx
>> >>>>
>> >>>> On 2013-08-24 20:44, root wrote:
>> >>>>> This patch contains the following two changes:
>> >>>>> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0
>> >>>>> with some reasons not emulated by L1, preemption timer value should
>> >>>>> be save in such exits.
>> >>>>> 2. Add support of "Save VMX-preemption timer value" VM-Exit
> controls
>> >>>>> to nVMX.
>> >>>>>
>> >>>>> With this patch, nested VMX preemption timer features are fully
>> >>>>> supported.
>> >>>>>
>> >>>>> Signed-off-by: Arthur Chunqi Li <yzt356@xxxxxxxxx>
>> >>>>> ---
>> >>>
>> >>>>>
>> >>>>> @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu
>> >>>> *vcpu, struct vmcs12 *vmcs12)
>> >>>>>        (vmcs_config.pin_based_exec_ctrl |
>> >>>>>         vmcs12->pin_based_vm_exec_control));
>> >>>>>
>> >>>>> -   if (vmcs12->pin_based_vm_exec_control &
>> >>> PIN_BASED_VMX_PREEMPTION_TIMER)
>> >>>>> -      vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
>> >>>>> -              vmcs12->vmx_preemption_timer_value);
>> >>>>> +   if (vmcs12->pin_based_vm_exec_control &
>> >>>> PIN_BASED_VMX_PREEMPTION_TIMER) {
>> >>>>> +      if (vmcs12->vm_exit_controls &
>> >>> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
>> >>>>> +         vmcs12->vmx_preemption_timer_value =
>> >>>>> +            vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
>> >>>>> +      else
>> >>>>> +         vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
>> >>>>> +               vmcs12->vmx_preemption_timer_value);
>> >>>>> +   }
>> >>>>
>> >>>> This is not correct. We still need to set the vmcs to
>> >>>> vmx_preemption_timer_value. The difference is that, on exit from L2,
>> >>>> vmx_preemption_timer_value has to be updated according to the saved
>> >>>> hardware state. The corresponding code is missing in your patch so
>> > far.
>> >>>
>> >>> I think something else maybe be missing here: assuming L0 handles
> exits
>> >>> for L2 without involving L1 (e.g. external interrupts or ept
>> > violations),
>> >>> then, we may spend some cycles in L0 handling these exits. Note L1 is
>> > not
>> >>> aware of these exits and from L1 perspective L2 was running on the
> CPU.
>> >>> That means that we may need to reduce these cycles spent at
>> >>> L0 from the preemtion timer or emulate a preemption timer exit to
>> >>> force a transition to L1 instead of resuming L2.
>> >>
>> >> That's precisely what the logic I described should achieve: reload the
>> >> value we saved on L2 exit on reentry.
>> >
>> > But don't you think we should also reduce the cycles spent at L0 from
> the
>> > preemption timer ? I mean, if we spent X cycles at L0 handling a L2
> exit
>> > which was not forwarded to L1, then, before we resume L2,
>> > the preemption timer should be: (previous_value_on_exit - X).
>> > If (previous_value_on_exit - X) < 0, then we should force ("emulate") a
>> > preemption timer exit between L2 and L1.
>>
>> We ask the hardware to save the value of the preemption on L2 exit. This
>> value will be exposed to L1 (if it asked for saving as well) and/or be
>> written back to the hardware on L2 reenty (unless L1 had a chance to run
>> and modified it). So the time spent in L0 is implicitly subtracted.
>
> I think you are suggesting the following, please correct me if I am wrong.
> 1) L1 resumes L2 with preemption timer enabled
> 2) L0 emulates the resume/launch
> 3) L2 runs for Y cycles until an external interrupt occurs (Y < preemption
> timer specified by L1)
> 4) L0 saved the preemption timer (original value - Y)
> 5) L0 spends X cycles handling the external interrupt
> 6) L0 resumes L2 with preemption timer = original value - Y
>
> Note that in this case "X is ignored".
>
> I was suggesting to do the following:
> 6) If original value - Y - X > 0 then
>  L0 resumes L2 with preemption timer = original value - Y - X
> else
>  L0 emulates a L2->L1 preemption timer exit (resumes L1)
Yes, your description is right. But I'm also thinking about my
previous consideration, why should we consider such X cycles as what
L2 spent. For nested VMX. external interrupt is not provided by L1, it
is triggered from L0 and want to cause periodically exit to L1, L2 is
"accidentally injure" actually. Since these interrupts are not
generated from L1 and not attend to affect L2, these cycles should not
be treated as what L2 spent. Though these cycles are "spent" in view
of L1, but they should not be taken into consideration in nested VMX.

For another example, if vcpu scheduled out when L0 handing such
interrupts and CPU does some other things then schedule this vcpu
again, these cycles of executing other processes should not be treated
as what L2 spent definitely.

Arthur
>
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html