Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

Jan Kiszka <jan.kiszka@xxxxxx> · Sun, 25 Aug 2013 11:08:35 +0200

On 2013-08-25 11:07, Arthur Chunqi Li wrote:
> On Sun, Aug 25, 2013 at 4:53 PM, Jan Kiszka <jan.kiszka@xxxxxx> wrote:
>> On 2013-08-25 10:41, Arthur Chunqi Li wrote:
>>> On Sun, Aug 25, 2013 at 4:18 PM, Abel Gordon <ABELG@xxxxxxxxxx> wrote:
>>>>
>>>>
>>>> kvm-owner@xxxxxxxxxxxxxxx wrote on 25/08/2013 10:54:13 AM:
>>>>
>>>>> From: Jan Kiszka <jan.kiszka@xxxxxx>
>>>>> To: Abel Gordon/Haifa/IBM@IBMIL,
>>>>> Cc: gleb@xxxxxxxxxx, kvm <kvm@xxxxxxxxxxxxxxx>, pbonzini@xxxxxxxxxx,
>>>>> "李春奇 <Arthur Chunqi Li>"  <yzt356@xxxxxxxxx>
>>>>> Date: 25/08/2013 10:54 AM
>>>>> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
>>>> timer
>>>>> Sent by: kvm-owner@xxxxxxxxxxxxxxx
>>>>>
>>>>> On 2013-08-25 09:50, Abel Gordon wrote:
>>>>>>
>>>>>>
>>>>>> kvm-owner@xxxxxxxxxxxxxxx wrote on 25/08/2013 10:43:12 AM:
>>>>>>
>>>>>>> From: Jan Kiszka <jan.kiszka@xxxxxx>
>>>>>>> To: Abel Gordon/Haifa/IBM@IBMIL,
>>>>>>> Cc: gleb@xxxxxxxxxx, kvm@xxxxxxxxxxxxxxx, kvm-owner@xxxxxxxxxxxxxxx,
>>>>>>> pbonzini@xxxxxxxxxx, "李春奇 <Arthur Chunqi Li>" <yzt356@xxxxxxxxx>
>>>>>>> Date: 25/08/2013 10:43 AM
>>>>>>> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
>>>>>> timer
>>>>>>> Sent by: kvm-owner@xxxxxxxxxxxxxxx
>>>>>>>
>>>>>>> On 2013-08-25 09:37, Abel Gordon wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>> From: Jan Kiszka <jan.kiszka@xxxxxx>
>>>>>>>>> To: "李春奇 <Arthur Chunqi Li>"  <yzt356@xxxxxxxxx>,
>>>>>>>>> Cc: kvm@xxxxxxxxxxxxxxx, gleb@xxxxxxxxxx, pbonzini@xxxxxxxxxx
>>>>>>>>> Date: 25/08/2013 09:44 AM
>>>>>>>>> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX
>>>> preemption
>>>>>>>> timer
>>>>>>>>> Sent by: kvm-owner@xxxxxxxxxxxxxxx
>>>>>>>>>
>>>>>>>>> On 2013-08-24 20:44, root wrote:
>>>>>>>>>> This patch contains the following two changes:
>>>>>>>>>> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0
>>>>>>>>>> with some reasons not emulated by L1, preemption timer value should
>>>>>>>>>> be save in such exits.
>>>>>>>>>> 2. Add support of "Save VMX-preemption timer value" VM-Exit
>>>> controls
>>>>>>>>>> to nVMX.
>>>>>>>>>>
>>>>>>>>>> With this patch, nested VMX preemption timer features are fully
>>>>>>>>>> supported.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Arthur Chunqi Li <yzt356@xxxxxxxxx>
>>>>>>>>>> ---
>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu
>>>>>>>>> *vcpu, struct vmcs12 *vmcs12)
>>>>>>>>>>        (vmcs_config.pin_based_exec_ctrl |
>>>>>>>>>>         vmcs12->pin_based_vm_exec_control));
>>>>>>>>>>
>>>>>>>>>> -   if (vmcs12->pin_based_vm_exec_control &
>>>>>>>> PIN_BASED_VMX_PREEMPTION_TIMER)
>>>>>>>>>> -      vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
>>>>>>>>>> -              vmcs12->vmx_preemption_timer_value);
>>>>>>>>>> +   if (vmcs12->pin_based_vm_exec_control &
>>>>>>>>> PIN_BASED_VMX_PREEMPTION_TIMER) {
>>>>>>>>>> +      if (vmcs12->vm_exit_controls &
>>>>>>>> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
>>>>>>>>>> +         vmcs12->vmx_preemption_timer_value =
>>>>>>>>>> +            vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
>>>>>>>>>> +      else
>>>>>>>>>> +         vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
>>>>>>>>>> +               vmcs12->vmx_preemption_timer_value);
>>>>>>>>>> +   }
>>>>>>>>>
>>>>>>>>> This is not correct. We still need to set the vmcs to
>>>>>>>>> vmx_preemption_timer_value. The difference is that, on exit from L2,
>>>>>>>>> vmx_preemption_timer_value has to be updated according to the saved
>>>>>>>>> hardware state. The corresponding code is missing in your patch so
>>>>>> far.
>>>>>>>>
>>>>>>>> I think something else maybe be missing here: assuming L0 handles
>>>> exits
>>>>>>>> for L2 without involving L1 (e.g. external interrupts or ept
>>>>>> violations),
>>>>>>>> then, we may spend some cycles in L0 handling these exits. Note L1 is
>>>>>> not
>>>>>>>> aware of these exits and from L1 perspective L2 was running on the
>>>> CPU.
>>>>>>>> That means that we may need to reduce these cycles spent at
>>>>>>>> L0 from the preemtion timer or emulate a preemption timer exit to
>>>>>>>> force a transition to L1 instead of resuming L2.
>>>>>>>
>>>>>>> That's precisely what the logic I described should achieve: reload the
>>>>>>> value we saved on L2 exit on reentry.
>>>>>>
>>>>>> But don't you think we should also reduce the cycles spent at L0 from
>>>> the
>>>>>> preemption timer ? I mean, if we spent X cycles at L0 handling a L2
>>>> exit
>>>>>> which was not forwarded to L1, then, before we resume L2,
>>>>>> the preemption timer should be: (previous_value_on_exit - X).
>>>>>> If (previous_value_on_exit - X) < 0, then we should force ("emulate") a
>>>>>> preemption timer exit between L2 and L1.
>>>>>
>>>>> We ask the hardware to save the value of the preemption on L2 exit. This
>>>>> value will be exposed to L1 (if it asked for saving as well) and/or be
>>>>> written back to the hardware on L2 reenty (unless L1 had a chance to run
>>>>> and modified it). So the time spent in L0 is implicitly subtracted.
>>>>
>>>> I think you are suggesting the following, please correct me if I am wrong.
>>>> 1) L1 resumes L2 with preemption timer enabled
>>>> 2) L0 emulates the resume/launch
>>>> 3) L2 runs for Y cycles until an external interrupt occurs (Y < preemption
>>>> timer specified by L1)
>>>> 4) L0 saved the preemption timer (original value - Y)
>>>> 5) L0 spends X cycles handling the external interrupt
>>>> 6) L0 resumes L2 with preemption timer = original value - Y
>>>>
>>>> Note that in this case "X is ignored".
>>>>
>>>> I was suggesting to do the following:
>>>> 6) If original value - Y - X > 0 then
>>>>  L0 resumes L2 with preemption timer = original value - Y - X
>>>> else
>>>>  L0 emulates a L2->L1 preemption timer exit (resumes L1)
>>> Yes, your description is right. But I'm also thinking about my
>>> previous consideration, why should we consider such X cycles as what
>>> L2 spent. For nested VMX. external interrupt is not provided by L1, it
>>> is triggered from L0 and want to cause periodically exit to L1, L2 is
>>> "accidentally injure" actually. Since these interrupts are not
>>> generated from L1 and not attend to affect L2, these cycles should not
>>> be treated as what L2 spent. Though these cycles are "spent" in view
>>> of L1, but they should not be taken into consideration in nested VMX.
>>>
>>> For another example, if vcpu scheduled out when L0 handing such
>>> interrupts and CPU does some other things then schedule this vcpu
>>> again, these cycles of executing other processes should not be treated
>>> as what L2 spent definitely.
>>
>> Think of your preemption timer test case: There you are indirectly
>> comparing the timer value against the TSC by checking the a preemption
>> timer exit happened after no more than n TSC cycles. But as the TSC L1
>> and L2 sees continued to tick while in L0, this test could now fail when
>> we leave out the L0 cycles.
>>
>> An alternative would be to hide all L0 TSC cycles from the guest. But
>> that's not the way KVM works, independent of the preemption timer case.
>>
>> BTW, you should use guest_read_tsc() on exit/entry of L2 in order to
>> calculate the time spent in L0. This will ensure that potential tweaks
>> of TSC_OFFSET that L0 might have applied in the meantime will be taken
>> into account.
> Well, in this case, these X cycles is actually not in L1 and L2, but
> it is treated that L2 consumes them, which seems like these cycles are
> "stolen".

Yes, they are stolen by L0 from L2.

Jan

Attachment:
signature.asc

Description: OpenPGP digital signature