RE: [PATCH v3] KVM: nVMX: Fully support of nested VMX preemption timer

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Arthur Chunqi Li wrote on 2013-09-05:
> > Arthur Chunqi Li wrote on 2013-09-05:
> >> On Thu, Sep 5, 2013 at 3:45 PM, Zhang, Yang Z
> >> <yang.z.zhang@xxxxxxxxx>
> >> wrote:
> >> > Arthur Chunqi Li wrote on 2013-09-04:
> >> >> This patch contains the following two changes:
> >> >> 1. Fix the bug in nested preemption timer support. If vmexit
> >> >> L2->L0 with some reasons not emulated by L1, preemption timer
> >> >> value should be save in such exits.
> >> >> 2. Add support of "Save VMX-preemption timer value" VM-Exit
> >> >> controls to nVMX.
> >> >>
> >> >> With this patch, nested VMX preemption timer features are fully
> supported.
> >> >>
> >> >> Signed-off-by: Arthur Chunqi Li <yzt356@xxxxxxxxx>
> >> >> ---
> >> >> This series depends on queue.
> >> >>
> >> >>  arch/x86/include/uapi/asm/msr-index.h |    1 +
> >> >>  arch/x86/kvm/vmx.c                    |   51
> >> >> ++++++++++++++++++++++++++++++---
> >> >>  2 files changed, 48 insertions(+), 4 deletions(-)
> >> >>
> >> >> diff --git a/arch/x86/include/uapi/asm/msr-index.h
> >> >> b/arch/x86/include/uapi/asm/msr-index.h
> >> >> index bb04650..b93e09a 100644
> >> >> --- a/arch/x86/include/uapi/asm/msr-index.h
> >> >> +++ b/arch/x86/include/uapi/asm/msr-index.h
> >> >> @@ -536,6 +536,7 @@
> >> >>
> >> >>  /* MSR_IA32_VMX_MISC bits */
> >> >>  #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL
> <<
> >> 29)
> >> >> +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE   0x1F
> >> >>  /* AMD-V MSRs */
> >> >>
> >> >>  #define MSR_VM_CR                       0xc0010114
> >> >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index
> >> >> 1f1da43..870caa8
> >> >> 100644
> >> >> --- a/arch/x86/kvm/vmx.c
> >> >> +++ b/arch/x86/kvm/vmx.c
> >> >> @@ -2204,7 +2204,14 @@ static __init void
> >> >> nested_vmx_setup_ctls_msrs(void)  #ifdef CONFIG_X86_64
> >> >>               VM_EXIT_HOST_ADDR_SPACE_SIZE |  #endif
> >> >> -             VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT;
> >> >> +             VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT
> |
> >> >> +             VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
> >> >> +     if (!(nested_vmx_pinbased_ctls_high &
> >> >> PIN_BASED_VMX_PREEMPTION_TIMER))
> >> >> +             nested_vmx_exit_ctls_high &=
> >> >> +
> (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
> >> >> +     if (!(nested_vmx_exit_ctls_high &
> >> >> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER))
> >> >> +             nested_vmx_pinbased_ctls_high &=
> >> >> +                     (~PIN_BASED_VMX_PREEMPTION_TIMER);
> >> > The following logic is more clearly:
> >> > if(nested_vmx_pinbased_ctls_high &
> >> PIN_BASED_VMX_PREEMPTION_TIMER)
> >> >         nested_vmx_exit_ctls_high |=
> >> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
> >> Here I have such consideration: this logic is wrong if CPU support
> >> PIN_BASED_VMX_PREEMPTION_TIMER but doesn't support
> >> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, though I don't know if this
> does
> >> occurs. So the codes above reads the MSR and reserves the features it
> >> supports, and here I just check if these two features are supported
> >> simultaneously.
> >>
> > No. Only VM_EXIT_SAVE_VMX_PREEMPTION_TIMER depends on
> > PIN_BASED_VMX_PREEMPTION_TIMER.
> PIN_BASED_VMX_PREEMPTION_TIMER is an
> > independent feature
> >
> >> You remind that this piece of codes can write like this:
> >> if (!(nested_vmx_pin_based_ctls_high &
> >> PIN_BASED_VMX_PREEMPTION_TIMER) ||
> >>                 !(nested_vmx_exit_ctls_high &
> >> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) {
> >>         nested_vmx_exit_ctls_high
> >> &=(~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER);
> >>         nested_vmx_pinbased_ctls_high &=
> >> (~PIN_BASED_VMX_PREEMPTION_TIMER);
> >> }
> >>
> >> This may reflect the logic I describe above that these two flags
> >> should support simultaneously, and brings less confusion.
> >> >
> >> > BTW: I don't see nested_vmx_setup_ctls_msrs() considers the
> >> > hardware's
> >> capability when expose those vmx features(not just preemption timer) to L1.
> >> The codes just above here, when setting pinbased control for nested
> >> vmx, it firstly "rdmsr MSR_IA32_VMX_PINBASED_CTLS", then use this to
> >> mask the features hardware not support. So does other control fields.
> >> >
> > Yes, I saw it.
> >
> >> >>       nested_vmx_exit_ctls_high |=
> >> >> (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR |
> >> >>
> VM_EXIT_LOAD_IA32_EFER);
> >> >>
> >> >> @@ -6707,6 +6714,23 @@ static void vmx_get_exit_info(struct
> >> >> kvm_vcpu *vcpu, u64 *info1, u64 *info2)
> >> >>       *info2 = vmcs_read32(VM_EXIT_INTR_INFO);  }
> >> >>
> >> >> +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu) {
> >> >> +     u64 delta_tsc_l1;
> >> >> +     u32 preempt_val_l1, preempt_val_l2, preempt_scale;
> >> >> +
> >> >> +     preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) &
> >> >> +
> >> MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE;
> >> >> +     preempt_val_l2 =
> vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
> >> >> +     delta_tsc_l1 = kvm_x86_ops->read_l1_tsc(vcpu,
> >> >> +                     native_read_tsc()) -
> vcpu->arch.last_guest_tsc;
> >> >> +     preempt_val_l1 = delta_tsc_l1 >> preempt_scale;
> >> >> +     if (preempt_val_l2 - preempt_val_l1 < 0)
> >> >> +             preempt_val_l2 = 0;
> >> >> +     else
> >> >> +             preempt_val_l2 -= preempt_val_l1;
> >> >> +     vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
> >> preempt_val_l2); }
> >> >>  /*
> >> >>   * The guest has exited.  See if we can fix it or if we need userspace
> >> >>   * assistance.
> >> >> @@ -6716,6 +6740,7 @@ static int vmx_handle_exit(struct kvm_vcpu
> >> *vcpu)
> >> >>       struct vcpu_vmx *vmx = to_vmx(vcpu);
> >> >>       u32 exit_reason = vmx->exit_reason;
> >> >>       u32 vectoring_info = vmx->idt_vectoring_info;
> >> >> +     int ret;
> >> >>
> >> >>       /* If guest state is invalid, start emulating */
> >> >>       if (vmx->emulation_required) @@ -6795,12 +6820,15 @@ static
> >> >> int vmx_handle_exit(struct kvm_vcpu
> >> >> *vcpu)
> >> >>
> >> >>       if (exit_reason < kvm_vmx_max_exit_handlers
> >> >>           && kvm_vmx_exit_handlers[exit_reason])
> >> >> -             return kvm_vmx_exit_handlers[exit_reason](vcpu);
> >> >> +             ret = kvm_vmx_exit_handlers[exit_reason](vcpu);
> >> >>       else {
> >> >>               vcpu->run->exit_reason = KVM_EXIT_UNKNOWN;
> >> >>               vcpu->run->hw.hardware_exit_reason = exit_reason;
> >> >> +             ret = 0;
> >> >>       }
> >> >> -     return 0;
> >> >> +     if (is_guest_mode(vcpu))
> >> >> +             nested_adjust_preemption_timer(vcpu);
> >> > Move this forward to avoid the changes for ret.
> >> The previous codes simply "return
> >> kvm_vmx_exit_handlers[exit_reason](vcpu);", which may also consumes
> >> CPU times. So put "nested_adjust_preemption_timer" ahead may cause
> >> the statistics inaccuracy.
> > Then you should put it before vmentry. Here still far from the point of
> vmentry.
> So where is the actual point of vmentry? I'm not quite familiar with that piece
> of codes.
vmx_vcpu_run should be the right place.

> >
> >> >> +     return ret;
> >> >>  }
> >> >>
> >> >>  static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr,
> >> >> int
> >> >> irr) @@
> >> >> -7518,6 +7546,7 @@ static void prepare_vmcs02(struct kvm_vcpu
> >> >> *vcpu, struct vmcs12 *vmcs12)  {
> >> >>       struct vcpu_vmx *vmx = to_vmx(vcpu);
> >> >>       u32 exec_control;
> >> >> +     u32 exit_control;
> >> >>
> >> >>       vmcs_write16(GUEST_ES_SELECTOR,
> vmcs12->guest_es_selector);
> >> >>       vmcs_write16(GUEST_CS_SELECTOR,
> vmcs12->guest_cs_selector);
> >> @@
> >> >> -7691,7 +7720,10 @@ static void prepare_vmcs02(struct kvm_vcpu
> >> >> *vcpu, struct vmcs12 *vmcs12)
> >> >>        * we should use its exit controls. Note that
> >> VM_EXIT_LOAD_IA32_EFER
> >> >>        * bits are further modified by vmx_set_efer() below.
> >> >>        */
> >> >> -     vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl);
> >> >> +     exit_control = vmcs_config.vmexit_ctrl;
> >> >> +     if (vmcs12->pin_based_vm_exec_control &
> >> >> PIN_BASED_VMX_PREEMPTION_TIMER)
> >> >> +             exit_control |=
> >> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER;
> >> >> +     vmcs_write32(VM_EXIT_CONTROLS, exit_control);
> >> > And here should be problem if host doesn't support
> >> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER.
> >> Nested vmx does check the hardware support of these features in
> >> "nested_vmx_setup_ctls_msrs", see my comments above.
> > But here you don't check it. You just set it unconditionally.
> What we have checked in nested_vmx_setup_ctls_msrs() confirms that only if
> these two features supported by hardware, nested vmx can enable preemption
> timer. That is to say, if L2 can set PIN_BASED_VMX_PREEMPTION_TIMER
> without failure, hardware must support
> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, which should guarantee in
> nested_vmx_setup_ctls_msrs (). (This is also why I write code like that in
> nested_vmx_setup_ctls_msrs ()).
I see your point. But the fact is PIN_BASED_VMX_PREEMPTION_TIMER is an independent feature. So your previous assumption is wrong. The hardware may support PIN_BASED_VMX_PREEMPTION_TIMER without VM_EXIT_SAVE_VMX_PREEMPTION_TIMER

> >
> >> >
> >> >>
> >> >>       /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and
> >> VM_ENTRY_IA32E_MODE are
> >> >>        * emulated by vmx_set_efer(), below.
> >> >> @@ -8090,6 +8122,17 @@ static void prepare_vmcs12(struct kvm_vcpu
> >> >> *vcpu, struct vmcs12 *vmcs12)
> >> >>       vmcs12->guest_pending_dbg_exceptions =
> >> >>               vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS);
> >> >>
> >> >> +     if (vmcs12->pin_based_vm_exec_control &
> >> >> +                     PIN_BASED_VMX_PREEMPTION_TIMER) {
> >> >> +             if (vmcs12->vm_exit_controls &
> >> >> +
> >> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)
> >> >> +                     vmcs12->vmx_preemption_timer_value =
> >> >> +
> >> vmcs_read32(VMX_PREEMPTION_TIMER_VALUE);
> >> >> +             else
> >> >> +
> >> vmcs_write32(VMX_PREEMPTION_TIMER_VALUE,
> >> >> +
> >> >> + vmcs12->vmx_preemption_timer_value);
> >> > Why write it to vmcs directly if
> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER
> >> not set?
> >> Yes, writing is needless here since vmcs02 will be re-prepared via
> >> prepare_vmcs02() when L1->L2. This function just save information
> >> needed, vmcs_write is useless.
> >>
> > I don't sure I am seeing your point. Per my point, even the
> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER is not setting, you still need to
> save the value to vmcs12->vmx_preemption_timer_value. Or else,
> prepare_vmcs02 cannot get the right value.
> If L2 doesn't enable VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, its value will
> be reset to what we set initially. This function is called due to
> L2->L1, so we should emulate L2's real exit process here. What we have
> done in other parts is to handle vmexit of L2->L0->L2, which is not the "real" L2
> vmexit.
I am not describing it clearly. I mean if VM_EXIT_SAVE_VMX_PREEMPTION_TIMER is not set, then prepare_vmcs02() always write the initial value which means vmx_preemption_timer_value to vmcs02 before L2 running. So the vmcs wirte here is redundant. Am I wrong?

> 
> Arthur
> >
> >> Arthur
> >> >
> >> >> +     }
> >> >> +
> >> >>       /*
> >> >>        * In some cases (usually, nested EPT), L2 is allowed to change
> its
> >> >>        * own CR3 without exiting. If it has changed it, we must keep it.
> >> >> --
> >> >> 1.7.9.5
> >> >>
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe kvm" in
> >> >> the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo
> >> >> info at http://vger.kernel.org/majordomo-info.html
> >> >
> >> > Best regards,
> >> > Yang
> >> >
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe kvm" in the
> >> body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at
> >> http://vger.kernel.org/majordomo-info.html
> >
> > Best regards,
> > Yang
> >
> >

Best regards,
Yang


��.n��������+%������w��{.n�����o�^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�


[Index of Archives]     [KVM ARM]     [KVM ia64]     [KVM ppc]     [Virtualization Tools]     [Spice Development]     [Libvirt]     [Libvirt Users]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite Questions]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux