Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption timer

Jan Kiszka <jan.kiszka@xxxxxx> · Sun, 25 Aug 2013 10:13:43 +0200

On 2013-08-25 10:04, Abel Gordon wrote:
> 
> 
> kvm-owner@xxxxxxxxxxxxxxx wrote on 25/08/2013 10:55:24 AM:
> 
>> From: Arthur Chunqi Li <yzt356@xxxxxxxxx>
>> To: Abel Gordon/Haifa/IBM@IBMIL,
>> Cc: Jan Kiszka <jan.kiszka@xxxxxx>, Gleb Natapov <gleb@xxxxxxxxxx>,
>> kvm <kvm@xxxxxxxxxxxxxxx>, kvm-owner@xxxxxxxxxxxxxxx, Paolo Bonzini
>> <pbonzini@xxxxxxxxxx>
>> Date: 25/08/2013 10:55 AM
>> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
> timer
>> Sent by: kvm-owner@xxxxxxxxxxxxxxx
>>
>> On Sun, Aug 25, 2013 at 3:50 PM, Abel Gordon <ABELG@xxxxxxxxxx> wrote:
>>>
>>>
>>> kvm-owner@xxxxxxxxxxxxxxx wrote on 25/08/2013 10:43:12 AM:
>>>
>>>> From: Jan Kiszka <jan.kiszka@xxxxxx>
>>>> To: Abel Gordon/Haifa/IBM@IBMIL,
>>>> Cc: gleb@xxxxxxxxxx, kvm@xxxxxxxxxxxxxxx, kvm-owner@xxxxxxxxxxxxxxx,
>>>> pbonzini@xxxxxxxxxx, "李春奇 <Arthur Chunqi Li>" <yzt356@xxxxxxxxx>
>>>> Date: 25/08/2013 10:43 AM
>>>> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption
>>> timer
>>>> Sent by: kvm-owner@xxxxxxxxxxxxxxx
>>>>
>>>> On 2013-08-25 09:37, Abel Gordon wrote:
>>>>>
>>>>>
>>>>>> From: Jan Kiszka <jan.kiszka@xxxxxx>
>>>>>> To: "李春奇 <Arthur Chunqi Li>"  <yzt356@xxxxxxxxx>,
>>>>>> Cc: kvm@xxxxxxxxxxxxxxx, gleb@xxxxxxxxxx, pbonzini@xxxxxxxxxx
>>>>>> Date: 25/08/2013 09:44 AM
>>>>>> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX
> preemption
>>>>> timer
>>>>>> Sent by: kvm-owner@xxxxxxxxxxxxxxx
>>>>>>
>>>>>> On 2013-08-24 20:44, root wrote:
>>>>>>> This patch contains the following two changes:
>>>>>>> 1. Fix the bug in nested preemption timer support. If vmexit L2->
> L0
>>>>>>> with some reasons not emulated by L1, preemption timer value
> should
>>>>>>> be save in such exits.
>>>>>>> 2. Add support of "Save VMX-preemption timer value" VM-Exit
> controls
>>>>>>> to nVMX.
>>>>>>>
>>>>>>> With this patch, nested VMX preemption timer features are fully
>>>>>>> supported.
>>>>>>>
>>>>>>> Signed-off-by: Arthur Chunqi Li <yzt356@xxxxxxxxx>
>>>>>>> ---
>>>>>
>>>>>>>
>>>>>>> @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu
>>>>>> *vcpu, struct vmcs12 *vmcs12)
>>>>>>>        (vmcs_config.pin_based_exec_ctrl |
>>>>>>>         vmcs12->pin_based_vm_exec_control));
>>>>>>>
>>>>>>> -   if (vmcs12->pin_based_vm_exec_control &
>>>>> PIN_BASED_VMX_PREEMPTION_TIMER)
>>>>>>> -      vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
>>>>>>> -              vmcs12->vmx_preemption_timer_value);
>>>>>>> +   if (vmcs12->pin_based_vm_exec_control &
>>>>>> PIN_BASED_VMX_PREEMPTION_TIMER) {
>>>>>>> +      if (vmcs12->vm_exit_controls &
>>>>> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
>>>>>>> +         vmcs12->vmx_preemption_timer_value =
>>>>>>> +            vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
>>>>>>> +      else
>>>>>>> +         vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
>>>>>>> +               vmcs12->vmx_preemption_timer_value);
>>>>>>> +   }
>>>>>>
>>>>>> This is not correct. We still need to set the vmcs to
>>>>>> vmx_preemption_timer_value. The difference is that, on exit from
> L2,
>>>>>> vmx_preemption_timer_value has to be updated according to the saved
>>>>>> hardware state. The corresponding code is missing in your patch so
>>> far.
>>>>>
>>>>> I think something else maybe be missing here: assuming L0 handles
> exits
>>>>> for L2 without involving L1 (e.g. external interrupts or ept
>>> violations),
>>>>> then, we may spend some cycles in L0 handling these exits. Note L1
> is
>>> not
>>>>> aware of these exits and from L1 perspective L2 was running on the
> CPU.
>>>>> That means that we may need to reduce these cycles spent at
>>>>> L0 from the preemtion timer or emulate a preemption timer exit to
>>>>> force a transition to L1 instead of resuming L2.
>>>>
>>>> That's precisely what the logic I described should achieve: reload the
>>>> value we saved on L2 exit on reentry.
>>>
>>> But don't you think we should also reduce the cycles spent at L0 from
> the
>>> preemption timer ? I mean, if we spent X cycles at L0 handling a L2
> exit
>>> which was not forwarded to L1, then, before we resume L2,
>>> the preemption timer should be: (previous_value_on_exit - X).
>>> If (previous_value_on_exit - X) < 0, then we should force ("emulate") a
>>> preemption timer exit between L2 and L1.
>> Sorry, I previously misunderstand your comments. But why should we
>> need to exclude cycles in L0 from L2 preemption value? These cycles
>> are not spent by L2 and it should not be on L2.
> 
> L1 asked the "hardware" (emulated by L0) to run L2 and force an exit
> after "Y" cycles. Now, in practice, we may spend "X" cycles at L0 handling
> exits without switching to L1. That means that from L1 perspective L2
> was running all these X cycles. L1 should assume that the instructions per
> cycle
> the CPU executed decreased but the cycles were spent. That's why I believe
> you should take in account these X cycles.
> 

Now I get it. There is likely some truth in this as the reference clock
for the preemption timer, the TSC, isn't stopped for L1/L2 while running
in L0. And the SDM demands the countdown to be proportional to that clock.

Jan

Attachment:
signature.asc

Description: OpenPGP digital signature