Re: [PATCH v3] KVM: nVMX: Fully support of nested VMX preemption timer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 5, 2013 at 5:24 PM, Zhang, Yang Z <yang.z.zhang@xxxxxxxxx> wrote:
> Arthur Chunqi Li wrote on 2013-09-05:
>> On Thu, Sep 5, 2013 at 3:45 PM, Zhang, Yang Z <yang.z.zhang@xxxxxxxxx>
>> wrote:
>> > Arthur Chunqi Li wrote on 2013-09-04:
>> >> This patch contains the following two changes:
>> >> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0
>> >> with some reasons not emulated by L1, preemption timer value should
>> >> be save in such exits.
>> >> 2. Add support of "Save VMX-preemption timer value" VM-Exit controls
>> >> to nVMX.
>> >>
>> >> With this patch, nested VMX preemption timer features are fully supported.
>> >>
>> >> Signed-off-by: Arthur Chunqi Li <yzt356@xxxxxxxxx>
>> >> ---
>> >> This series depends on queue.
>> >>
>> >>  arch/x86/include/uapi/asm/msr-index.h |    1 +
>> >>  arch/x86/kvm/vmx.c                    |   51
>> >> ++++++++++++++++++++++++++++++---
>> >>  2 files changed, 48 insertions(+), 4 deletions(-)
>> >>
>> >> diff --git a/arch/x86/include/uapi/asm/msr-index.h
>> >> b/arch/x86/include/uapi/asm/msr-index.h
>> >> index bb04650..b93e09a 100644
>> >> --- a/arch/x86/include/uapi/asm/msr-index.h
>> >> +++ b/arch/x86/include/uapi/asm/msr-index.h
>> >> @@ -536,6 +536,7 @@
>> >>
>> >>  /* MSR_IA32_VMX_MISC bits */
>> >>  #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL <<
>> 29)
>> >> +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE   0x1F
>> >>  /* AMD-V MSRs */
>> >>
>> >>  #define MSR_VM_CR                       0xc0010114
>> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
>> >> 1f1da43..870caa8
>> >> 100644
>> >> --- a/arch/x86/kvm/vmx.c
>> >> +++ b/arch/x86/kvm/vmx.c
>> >> @@ -2204,7 +2204,14 @@ static __init void
>> >> nested_vmx_setup_ctls_msrs(void)  #ifdef CONFIG_X86_64
>> >>               VM_EXIT_HOST_ADDR_SPACE_SIZE |  #endif
>> >> -             VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
>> >> +             VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT |
>> >> +             VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
>> >> +     if (!(nested_vmx_pinbased_ctls_high &
>> >> PIN_BASED_VMX_PREEMPTION_TIMER))
>> >> +             nested_vmx_exit_ctls_high &=
>> >> +                     (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
>> >> +     if (!(nested_vmx_exit_ctls_high &
>> >> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER))
>> >> +             nested_vmx_pinbased_ctls_high &=
>> >> +                     (~PIN_BASED_VMX_PREEMPTION_TIMER);
>> > The following logic is more clearly:
>> > if(nested_vmx_pinbased_ctls_high &
>> PIN_BASED_VMX_PREEMPTION_TIMER)
>> >         nested_vmx_exit_ctls_high |=
>> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
>> Here I have such consideration: this logic is wrong if CPU support
>> PIN_BASED_VMX_PREEMPTION_TIMER but doesn't support
>> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, though I don't know if this does
>> occurs. So the codes above reads the MSR and reserves the features it
>> supports, and here I just check if these two features are supported
>> simultaneously.
>>
> No. Only VM_EXIT_SAVE_VMX_PREEMPTION_TIMER depends on PIN_BASED_VMX_PREEMPTION_TIMER. PIN_BASED_VMX_PREEMPTION_TIMER is an independent feature
>
>> You remind that this piece of codes can write like this:
>> if (!(nested_vmx_pin_based_ctls_high &
>> PIN_BASED_VMX_PREEMPTION_TIMER) ||
>>                 !(nested_vmx_exit_ctls_high &
>> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
>>         nested_vmx_exit_ctls_high
>> &=(~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
>>         nested_vmx_pinbased_ctls_high &=
>> (~PIN_BASED_VMX_PREEMPTION_TIMER);
>> }
>>
>> This may reflect the logic I describe above that these two flags should support
>> simultaneously, and brings less confusion.
>> >
>> > BTW: I don't see nested_vmx_setup_ctls_msrs() considers the hardware's
>> capability when expose those vmx features(not just preemption timer) to L1.
>> The codes just above here, when setting pinbased control for nested vmx, it
>> firstly "rdmsr MSR_IA32_VMX_PINBASED_CTLS", then use this to mask the
>> features hardware not support. So does other control fields.
>> >
> Yes, I saw it.
>
>> >>       nested_vmx_exit_ctls_high |=
>> >> (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
>> >>                                     VM_EXIT_LOAD_IA32_EFER);
>> >>
>> >> @@ -6707,6 +6714,23 @@ static void vmx_get_exit_info(struct kvm_vcpu
>> >> *vcpu, u64 *info1, u64 *info2)
>> >>       *info2 = vmcs_read32(VM_EXIT_INTR_INFO);  }
>> >>
>> >> +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu) {
>> >> +     u64 delta_tsc_l1;
>> >> +     u32 preempt_val_l1, preempt_val_l2, preempt_scale;
>> >> +
>> >> +     preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) &
>> >> +
>> MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE;
>> >> +     preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
>> >> +     delta_tsc_l1 = kvm_x86_ops->read_l1_tsc(vcpu,
>> >> +                     native_read_tsc()) - vcpu->arch.last_guest_tsc;
>> >> +     preempt_val_l1 = delta_tsc_l1 >> preempt_scale;
>> >> +     if (preempt_val_l2 - preempt_val_l1 < 0)
>> >> +             preempt_val_l2 = 0;
>> >> +     else
>> >> +             preempt_val_l2 -= preempt_val_l1;
>> >> +     vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
>> preempt_val_l2); }
>> >>  /*
>> >>   * The guest has exited.  See if we can fix it or if we need userspace
>> >>   * assistance.
>> >> @@ -6716,6 +6740,7 @@ static int vmx_handle_exit(struct kvm_vcpu
>> *vcpu)
>> >>       struct vcpu_vmx *vmx = to_vmx(vcpu);
>> >>       u32 exit_reason = vmx->exit_reason;
>> >>       u32 vectoring_info = vmx->idt_vectoring_info;
>> >> +     int ret;
>> >>
>> >>       /* If guest state is invalid, start emulating */
>> >>       if (vmx->emulation_required)
>> >> @@ -6795,12 +6820,15 @@ static int vmx_handle_exit(struct kvm_vcpu
>> >> *vcpu)
>> >>
>> >>       if (exit_reason < kvm_vmx_max_exit_handlers
>> >>           && kvm_vmx_exit_handlers[exit_reason])
>> >> -             return kvm_vmx_exit_handlers[exit_reason](vcpu);
>> >> +             ret = kvm_vmx_exit_handlers[exit_reason](vcpu);
>> >>       else {
>> >>               vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
>> >>               vcpu->run->hw.hardware_exit_reason = exit_reason;
>> >> +             ret = 0;
>> >>       }
>> >> -     return 0;
>> >> +     if (is_guest_mode(vcpu))
>> >> +             nested_adjust_preemption_timer(vcpu);
>> > Move this forward to avoid the changes for ret.
>> The previous codes simply "return
>> kvm_vmx_exit_handlers[exit_reason](vcpu);", which may also consumes CPU
>> times. So put "nested_adjust_preemption_timer" ahead may cause the
>> statistics inaccuracy.
> Then you should put it before vmentry. Here still far from the point of vmentry.
So where is the actual point of vmentry? I'm not quite familiar with
that piece of codes.
>
>> >> +     return ret;
>> >>  }
>> >>
>> >>  static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int
>> >> irr) @@
>> >> -7518,6 +7546,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu,
>> >> struct vmcs12 *vmcs12)  {
>> >>       struct vcpu_vmx *vmx = to_vmx(vcpu);
>> >>       u32 exec_control;
>> >> +     u32 exit_control;
>> >>
>> >>       vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector);
>> >>       vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector);
>> @@
>> >> -7691,7 +7720,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu,
>> >> struct vmcs12 *vmcs12)
>> >>        * we should use its exit controls. Note that
>> VM_EXIT_LOAD_IA32_EFER
>> >>        * bits are further modified by vmx_set_efer() below.
>> >>        */
>> >> -     vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
>> >> +     exit_control = vmcs_config.vmexit_ctrl;
>> >> +     if (vmcs12->pin_based_vm_exec_control &
>> >> PIN_BASED_VMX_PREEMPTION_TIMER)
>> >> +             exit_control |=
>> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
>> >> +     vmcs_write32(VM_EXIT_CONTROLS, exit_control);
>> > And here should be problem if host doesn't support
>> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER.
>> Nested vmx does check the hardware support of these features in
>> "nested_vmx_setup_ctls_msrs", see my comments above.
> But here you don't check it. You just set it unconditionally.
What we have checked in nested_vmx_setup_ctls_msrs() confirms that
only if these two features supported by hardware, nested vmx can
enable preemption timer. That is to say, if L2 can set
PIN_BASED_VMX_PREEMPTION_TIMER without failure, hardware must support
VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, which should guarantee in
nested_vmx_setup_ctls_msrs (). (This is also why I write code like
that in nested_vmx_setup_ctls_msrs ()).
>
>> >
>> >>
>> >>       /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and
>> VM_ENTRY_IA32E_MODE are
>> >>        * emulated by vmx_set_efer(), below.
>> >> @@ -8090,6 +8122,17 @@ static void prepare_vmcs12(struct kvm_vcpu
>> >> *vcpu, struct vmcs12 *vmcs12)
>> >>       vmcs12->guest_pending_dbg_exceptions =
>> >>               vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS);
>> >>
>> >> +     if (vmcs12->pin_based_vm_exec_control &
>> >> +                     PIN_BASED_VMX_PREEMPTION_TIMER) {
>> >> +             if (vmcs12->vm_exit_controls &
>> >> +
>> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
>> >> +                     vmcs12->vmx_preemption_timer_value =
>> >> +
>> vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
>> >> +             else
>> >> +
>> vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
>> >> +
>> >> + vmcs12->vmx_preemption_timer_value);
>> > Why write it to vmcs directly if VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
>> not set?
>> Yes, writing is needless here since vmcs02 will be re-prepared via
>> prepare_vmcs02() when L1->L2. This function just save information needed,
>> vmcs_write is useless.
>>
> I don't sure I am seeing your point. Per my point, even the VM_EXIT_SAVE_VMX_PREEMPTION_TIMER is not setting, you still need to save the value to vmcs12->vmx_preemption_timer_value. Or else, prepare_vmcs02 cannot get the right value.
If L2 doesn't enable VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, its value will
be reset to what we set initially. This function is called due to
L2->L1, so we should emulate L2's real exit process here. What we have
done in other parts is to handle vmexit of L2->L0->L2, which is not
the "real" L2 vmexit.

Arthur
>
>> Arthur
>> >
>> >> +     }
>> >> +
>> >>       /*
>> >>        * In some cases (usually, nested EPT), L2 is allowed to change its
>> >>        * own CR3 without exiting. If it has changed it, we must keep it.
>> >> --
>> >> 1.7.9.5
>> >>
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe kvm" in the
>> >> body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at
>> >> http://vger.kernel.org/majordomo-info.html
>> >
>> > Best regards,
>> > Yang
>> >
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a
>> message to majordomo@xxxxxxxxxxxxxxx More majordomo info at
>> http://vger.kernel.org/majordomo-info.html
>
> Best regards,
> Yang
>
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux