Arthur Chunqi Li wrote on 2013-09-05: > On Thu, Sep 5, 2013 at 3:45 PM, Zhang, Yang Z <yang.z.zhang@xxxxxxxxx> > wrote: > > Arthur Chunqi Li wrote on 2013-09-04: > >> This patch contains the following two changes: > >> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0 > >> with some reasons not emulated by L1, preemption timer value should > >> be save in such exits. > >> 2. Add support of "Save VMX-preemption timer value" VM-Exit controls > >> to nVMX. > >> > >> With this patch, nested VMX preemption timer features are fully supported. > >> > >> Signed-off-by: Arthur Chunqi Li <yzt356@xxxxxxxxx> > >> --- > >> This series depends on queue. > >> > >> arch/x86/include/uapi/asm/msr-index.h | 1 + > >> arch/x86/kvm/vmx.c | 51 > >> ++++++++++++++++++++++++++++++--- > >> 2 files changed, 48 insertions(+), 4 deletions(-) > >> > >> diff --git a/arch/x86/include/uapi/asm/msr-index.h > >> b/arch/x86/include/uapi/asm/msr-index.h > >> index bb04650..b93e09a 100644 > >> --- a/arch/x86/include/uapi/asm/msr-index.h > >> +++ b/arch/x86/include/uapi/asm/msr-index.h > >> @@ -536,6 +536,7 @@ > >> > >> /* MSR_IA32_VMX_MISC bits */ > >> #define MSR_IA32_VMX_MISC_VMWRITE_SHADOW_RO_FIELDS (1ULL << > 29) > >> +#define MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE 0x1F > >> /* AMD-V MSRs */ > >> > >> #define MSR_VM_CR 0xc0010114 > >> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index > >> 1f1da43..870caa8 > >> 100644 > >> --- a/arch/x86/kvm/vmx.c > >> +++ b/arch/x86/kvm/vmx.c > >> @@ -2204,7 +2204,14 @@ static __init void > >> nested_vmx_setup_ctls_msrs(void) #ifdef CONFIG_X86_64 > >> VM_EXIT_HOST_ADDR_SPACE_SIZE | #endif > >> - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; > >> + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | > >> + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; > >> + if (!(nested_vmx_pinbased_ctls_high & > >> PIN_BASED_VMX_PREEMPTION_TIMER)) > >> + nested_vmx_exit_ctls_high &= > >> + (~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); > >> + if (!(nested_vmx_exit_ctls_high & > >> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) > >> + nested_vmx_pinbased_ctls_high &= > >> + (~PIN_BASED_VMX_PREEMPTION_TIMER); > > The following logic is more clearly: > > if(nested_vmx_pinbased_ctls_high & > PIN_BASED_VMX_PREEMPTION_TIMER) > > nested_vmx_exit_ctls_high |= > VM_EXIT_SAVE_VMX_PREEMPTION_TIMER > Here I have such consideration: this logic is wrong if CPU support > PIN_BASED_VMX_PREEMPTION_TIMER but doesn't support > VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, though I don't know if this does > occurs. So the codes above reads the MSR and reserves the features it > supports, and here I just check if these two features are supported > simultaneously. > No. Only VM_EXIT_SAVE_VMX_PREEMPTION_TIMER depends on PIN_BASED_VMX_PREEMPTION_TIMER. PIN_BASED_VMX_PREEMPTION_TIMER is an independent feature > You remind that this piece of codes can write like this: > if (!(nested_vmx_pin_based_ctls_high & > PIN_BASED_VMX_PREEMPTION_TIMER) || > !(nested_vmx_exit_ctls_high & > VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) { > nested_vmx_exit_ctls_high > &=(~VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); > nested_vmx_pinbased_ctls_high &= > (~PIN_BASED_VMX_PREEMPTION_TIMER); > } > > This may reflect the logic I describe above that these two flags should support > simultaneously, and brings less confusion. > > > > BTW: I don't see nested_vmx_setup_ctls_msrs() considers the hardware's > capability when expose those vmx features(not just preemption timer) to L1. > The codes just above here, when setting pinbased control for nested vmx, it > firstly "rdmsr MSR_IA32_VMX_PINBASED_CTLS", then use this to mask the > features hardware not support. So does other control fields. > > Yes, I saw it. > >> nested_vmx_exit_ctls_high |= > >> (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | > >> VM_EXIT_LOAD_IA32_EFER); > >> > >> @@ -6707,6 +6714,23 @@ static void vmx_get_exit_info(struct kvm_vcpu > >> *vcpu, u64 *info1, u64 *info2) > >> *info2 = vmcs_read32(VM_EXIT_INTR_INFO); } > >> > >> +static void nested_adjust_preemption_timer(struct kvm_vcpu *vcpu) { > >> + u64 delta_tsc_l1; > >> + u32 preempt_val_l1, preempt_val_l2, preempt_scale; > >> + > >> + preempt_scale = native_read_msr(MSR_IA32_VMX_MISC) & > >> + > MSR_IA32_VMX_MISC_PREEMPTION_TIMER_SCALE; > >> + preempt_val_l2 = vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); > >> + delta_tsc_l1 = kvm_x86_ops->read_l1_tsc(vcpu, > >> + native_read_tsc()) - vcpu->arch.last_guest_tsc; > >> + preempt_val_l1 = delta_tsc_l1 >> preempt_scale; > >> + if (preempt_val_l2 - preempt_val_l1 < 0) > >> + preempt_val_l2 = 0; > >> + else > >> + preempt_val_l2 -= preempt_val_l1; > >> + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, > preempt_val_l2); } > >> /* > >> * The guest has exited. See if we can fix it or if we need userspace > >> * assistance. > >> @@ -6716,6 +6740,7 @@ static int vmx_handle_exit(struct kvm_vcpu > *vcpu) > >> struct vcpu_vmx *vmx = to_vmx(vcpu); > >> u32 exit_reason = vmx->exit_reason; > >> u32 vectoring_info = vmx->idt_vectoring_info; > >> + int ret; > >> > >> /* If guest state is invalid, start emulating */ > >> if (vmx->emulation_required) > >> @@ -6795,12 +6820,15 @@ static int vmx_handle_exit(struct kvm_vcpu > >> *vcpu) > >> > >> if (exit_reason < kvm_vmx_max_exit_handlers > >> && kvm_vmx_exit_handlers[exit_reason]) > >> - return kvm_vmx_exit_handlers[exit_reason](vcpu); > >> + ret = kvm_vmx_exit_handlers[exit_reason](vcpu); > >> else { > >> vcpu->run->exit_reason = KVM_EXIT_UNKNOWN; > >> vcpu->run->hw.hardware_exit_reason = exit_reason; > >> + ret = 0; > >> } > >> - return 0; > >> + if (is_guest_mode(vcpu)) > >> + nested_adjust_preemption_timer(vcpu); > > Move this forward to avoid the changes for ret. > The previous codes simply "return > kvm_vmx_exit_handlers[exit_reason](vcpu);", which may also consumes CPU > times. So put "nested_adjust_preemption_timer" ahead may cause the > statistics inaccuracy. Then you should put it before vmentry. Here still far from the point of vmentry. > >> + return ret; > >> } > >> > >> static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int > >> irr) @@ > >> -7518,6 +7546,7 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, > >> struct vmcs12 *vmcs12) { > >> struct vcpu_vmx *vmx = to_vmx(vcpu); > >> u32 exec_control; > >> + u32 exit_control; > >> > >> vmcs_write16(GUEST_ES_SELECTOR, vmcs12->guest_es_selector); > >> vmcs_write16(GUEST_CS_SELECTOR, vmcs12->guest_cs_selector); > @@ > >> -7691,7 +7720,10 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, > >> struct vmcs12 *vmcs12) > >> * we should use its exit controls. Note that > VM_EXIT_LOAD_IA32_EFER > >> * bits are further modified by vmx_set_efer() below. > >> */ > >> - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); > >> + exit_control = vmcs_config.vmexit_ctrl; > >> + if (vmcs12->pin_based_vm_exec_control & > >> PIN_BASED_VMX_PREEMPTION_TIMER) > >> + exit_control |= > VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; > >> + vmcs_write32(VM_EXIT_CONTROLS, exit_control); > > And here should be problem if host doesn't support > VM_EXIT_SAVE_VMX_PREEMPTION_TIMER. > Nested vmx does check the hardware support of these features in > "nested_vmx_setup_ctls_msrs", see my comments above. But here you don't check it. You just set it unconditionally. > > > >> > >> /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and > VM_ENTRY_IA32E_MODE are > >> * emulated by vmx_set_efer(), below. > >> @@ -8090,6 +8122,17 @@ static void prepare_vmcs12(struct kvm_vcpu > >> *vcpu, struct vmcs12 *vmcs12) > >> vmcs12->guest_pending_dbg_exceptions = > >> vmcs_readl(GUEST_PENDING_DBG_EXCEPTIONS); > >> > >> + if (vmcs12->pin_based_vm_exec_control & > >> + PIN_BASED_VMX_PREEMPTION_TIMER) { > >> + if (vmcs12->vm_exit_controls & > >> + > VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) > >> + vmcs12->vmx_preemption_timer_value = > >> + > vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); > >> + else > >> + > vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, > >> + > >> + vmcs12->vmx_preemption_timer_value); > > Why write it to vmcs directly if VM_EXIT_SAVE_VMX_PREEMPTION_TIMER > not set? > Yes, writing is needless here since vmcs02 will be re-prepared via > prepare_vmcs02() when L1->L2. This function just save information needed, > vmcs_write is useless. > I don't sure I am seeing your point. Per my point, even the VM_EXIT_SAVE_VMX_PREEMPTION_TIMER is not setting, you still need to save the value to vmcs12->vmx_preemption_timer_value. Or else, prepare_vmcs02 cannot get the right value. > Arthur > > > >> + } > >> + > >> /* > >> * In some cases (usually, nested EPT), L2 is allowed to change its > >> * own CR3 without exiting. If it has changed it, we must keep it. > >> -- > >> 1.7.9.5 > >> > >> -- > >> To unsubscribe from this list: send the line "unsubscribe kvm" in the > >> body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at > >> http://vger.kernel.org/majordomo-info.html > > > > Best regards, > > Yang > > > > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a > message to majordomo@xxxxxxxxxxxxxxx More majordomo info at > http://vger.kernel.org/majordomo-info.html Best regards, Yang ��.n��������+%������w��{.n�����o�^n�r������&��z�ޗ�zf���h���~����������_��+v���)ߣ�