On Sun, Aug 25, 2013 at 4:53 PM, Jan Kiszka <jan.kiszka@xxxxxx> wrote: > On 2013-08-25 10:41, Arthur Chunqi Li wrote: >> On Sun, Aug 25, 2013 at 4:18 PM, Abel Gordon <ABELG@xxxxxxxxxx> wrote: >>> >>> >>> kvm-owner@xxxxxxxxxxxxxxx wrote on 25/08/2013 10:54:13 AM: >>> >>>> From: Jan Kiszka <jan.kiszka@xxxxxx> >>>> To: Abel Gordon/Haifa/IBM@IBMIL, >>>> Cc: gleb@xxxxxxxxxx, kvm <kvm@xxxxxxxxxxxxxxx>, pbonzini@xxxxxxxxxx, >>>> "李春奇 <Arthur Chunqi Li>" <yzt356@xxxxxxxxx> >>>> Date: 25/08/2013 10:54 AM >>>> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption >>> timer >>>> Sent by: kvm-owner@xxxxxxxxxxxxxxx >>>> >>>> On 2013-08-25 09:50, Abel Gordon wrote: >>>>> >>>>> >>>>> kvm-owner@xxxxxxxxxxxxxxx wrote on 25/08/2013 10:43:12 AM: >>>>> >>>>>> From: Jan Kiszka <jan.kiszka@xxxxxx> >>>>>> To: Abel Gordon/Haifa/IBM@IBMIL, >>>>>> Cc: gleb@xxxxxxxxxx, kvm@xxxxxxxxxxxxxxx, kvm-owner@xxxxxxxxxxxxxxx, >>>>>> pbonzini@xxxxxxxxxx, "李春奇 <Arthur Chunqi Li>" <yzt356@xxxxxxxxx> >>>>>> Date: 25/08/2013 10:43 AM >>>>>> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX preemption >>>>> timer >>>>>> Sent by: kvm-owner@xxxxxxxxxxxxxxx >>>>>> >>>>>> On 2013-08-25 09:37, Abel Gordon wrote: >>>>>>> >>>>>>> >>>>>>>> From: Jan Kiszka <jan.kiszka@xxxxxx> >>>>>>>> To: "李春奇 <Arthur Chunqi Li>" <yzt356@xxxxxxxxx>, >>>>>>>> Cc: kvm@xxxxxxxxxxxxxxx, gleb@xxxxxxxxxx, pbonzini@xxxxxxxxxx >>>>>>>> Date: 25/08/2013 09:44 AM >>>>>>>> Subject: Re: [PATCH] KVM: nVMX: Fully support of nested VMX >>> preemption >>>>>>> timer >>>>>>>> Sent by: kvm-owner@xxxxxxxxxxxxxxx >>>>>>>> >>>>>>>> On 2013-08-24 20:44, root wrote: >>>>>>>>> This patch contains the following two changes: >>>>>>>>> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0 >>>>>>>>> with some reasons not emulated by L1, preemption timer value should >>>>>>>>> be save in such exits. >>>>>>>>> 2. Add support of "Save VMX-preemption timer value" VM-Exit >>> controls >>>>>>>>> to nVMX. >>>>>>>>> >>>>>>>>> With this patch, nested VMX preemption timer features are fully >>>>>>>>> supported. >>>>>>>>> >>>>>>>>> Signed-off-by: Arthur Chunqi Li <yzt356@xxxxxxxxx> >>>>>>>>> --- >>>>>>> >>>>>>>>> >>>>>>>>> @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu >>>>>>>> *vcpu, struct vmcs12 *vmcs12) >>>>>>>>> (vmcs_config.pin_based_exec_ctrl | >>>>>>>>> vmcs12->pin_based_vm_exec_control)); >>>>>>>>> >>>>>>>>> - if (vmcs12->pin_based_vm_exec_control & >>>>>>> PIN_BASED_VMX_PREEMPTION_TIMER) >>>>>>>>> - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, >>>>>>>>> - vmcs12->vmx_preemption_timer_value); >>>>>>>>> + if (vmcs12->pin_based_vm_exec_control & >>>>>>>> PIN_BASED_VMX_PREEMPTION_TIMER) { >>>>>>>>> + if (vmcs12->vm_exit_controls & >>>>>>> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) >>>>>>>>> + vmcs12->vmx_preemption_timer_value = >>>>>>>>> + vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); >>>>>>>>> + else >>>>>>>>> + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, >>>>>>>>> + vmcs12->vmx_preemption_timer_value); >>>>>>>>> + } >>>>>>>> >>>>>>>> This is not correct. We still need to set the vmcs to >>>>>>>> vmx_preemption_timer_value. The difference is that, on exit from L2, >>>>>>>> vmx_preemption_timer_value has to be updated according to the saved >>>>>>>> hardware state. The corresponding code is missing in your patch so >>>>> far. >>>>>>> >>>>>>> I think something else maybe be missing here: assuming L0 handles >>> exits >>>>>>> for L2 without involving L1 (e.g. external interrupts or ept >>>>> violations), >>>>>>> then, we may spend some cycles in L0 handling these exits. Note L1 is >>>>> not >>>>>>> aware of these exits and from L1 perspective L2 was running on the >>> CPU. >>>>>>> That means that we may need to reduce these cycles spent at >>>>>>> L0 from the preemtion timer or emulate a preemption timer exit to >>>>>>> force a transition to L1 instead of resuming L2. >>>>>> >>>>>> That's precisely what the logic I described should achieve: reload the >>>>>> value we saved on L2 exit on reentry. >>>>> >>>>> But don't you think we should also reduce the cycles spent at L0 from >>> the >>>>> preemption timer ? I mean, if we spent X cycles at L0 handling a L2 >>> exit >>>>> which was not forwarded to L1, then, before we resume L2, >>>>> the preemption timer should be: (previous_value_on_exit - X). >>>>> If (previous_value_on_exit - X) < 0, then we should force ("emulate") a >>>>> preemption timer exit between L2 and L1. >>>> >>>> We ask the hardware to save the value of the preemption on L2 exit. This >>>> value will be exposed to L1 (if it asked for saving as well) and/or be >>>> written back to the hardware on L2 reenty (unless L1 had a chance to run >>>> and modified it). So the time spent in L0 is implicitly subtracted. >>> >>> I think you are suggesting the following, please correct me if I am wrong. >>> 1) L1 resumes L2 with preemption timer enabled >>> 2) L0 emulates the resume/launch >>> 3) L2 runs for Y cycles until an external interrupt occurs (Y < preemption >>> timer specified by L1) >>> 4) L0 saved the preemption timer (original value - Y) >>> 5) L0 spends X cycles handling the external interrupt >>> 6) L0 resumes L2 with preemption timer = original value - Y >>> >>> Note that in this case "X is ignored". >>> >>> I was suggesting to do the following: >>> 6) If original value - Y - X > 0 then >>> L0 resumes L2 with preemption timer = original value - Y - X >>> else >>> L0 emulates a L2->L1 preemption timer exit (resumes L1) >> Yes, your description is right. But I'm also thinking about my >> previous consideration, why should we consider such X cycles as what >> L2 spent. For nested VMX. external interrupt is not provided by L1, it >> is triggered from L0 and want to cause periodically exit to L1, L2 is >> "accidentally injure" actually. Since these interrupts are not >> generated from L1 and not attend to affect L2, these cycles should not >> be treated as what L2 spent. Though these cycles are "spent" in view >> of L1, but they should not be taken into consideration in nested VMX. >> >> For another example, if vcpu scheduled out when L0 handing such >> interrupts and CPU does some other things then schedule this vcpu >> again, these cycles of executing other processes should not be treated >> as what L2 spent definitely. > > Think of your preemption timer test case: There you are indirectly > comparing the timer value against the TSC by checking the a preemption > timer exit happened after no more than n TSC cycles. But as the TSC L1 > and L2 sees continued to tick while in L0, this test could now fail when > we leave out the L0 cycles. > > An alternative would be to hide all L0 TSC cycles from the guest. But > that's not the way KVM works, independent of the preemption timer case. > > BTW, you should use guest_read_tsc() on exit/entry of L2 in order to > calculate the time spent in L0. This will ensure that potential tweaks > of TSC_OFFSET that L0 might have applied in the meantime will be taken > into account. Well, in this case, these X cycles is actually not in L1 and L2, but it is treated that L2 consumes them, which seems like these cycles are "stolen". Arthur > > Jan > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html