On Sun, Aug 25, 2013 at 3:28 PM, Jan Kiszka <jan.kiszka@xxxxxx> wrote: > On 2013-08-25 09:24, Arthur Chunqi Li wrote: >> On Sun, Aug 25, 2013 at 2:44 PM, Jan Kiszka <jan.kiszka@xxxxxx> wrote: >>> On 2013-08-24 20:44, root wrote: >>>> This patch contains the following two changes: >>>> 1. Fix the bug in nested preemption timer support. If vmexit L2->L0 >>>> with some reasons not emulated by L1, preemption timer value should >>>> be save in such exits. >>>> 2. Add support of "Save VMX-preemption timer value" VM-Exit controls >>>> to nVMX. >>>> >>>> With this patch, nested VMX preemption timer features are fully >>>> supported. >>>> >>>> Signed-off-by: Arthur Chunqi Li <yzt356@xxxxxxxxx> >>>> --- >>>> arch/x86/kvm/vmx.c | 30 +++++++++++++++++++++++++----- >>>> 1 file changed, 25 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>>> index 57b4e12..9579409 100644 >>>> --- a/arch/x86/kvm/vmx.c >>>> +++ b/arch/x86/kvm/vmx.c >>>> @@ -2204,7 +2204,8 @@ static __init void nested_vmx_setup_ctls_msrs(void) >>>> #ifdef CONFIG_X86_64 >>>> VM_EXIT_HOST_ADDR_SPACE_SIZE | >>>> #endif >>>> - VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT; >>>> + VM_EXIT_LOAD_IA32_PAT | VM_EXIT_SAVE_IA32_PAT | >>>> + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER; >>>> nested_vmx_exit_ctls_high |= (VM_EXIT_ALWAYSON_WITHOUT_TRUE_MSR | >>>> VM_EXIT_LOAD_IA32_EFER); >>> >>> In the absence of VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, you need to hide >>> PIN_BASED_VMX_PREEMPTION_TIMER from the guest as we cannot emulate its >>> behavior properly in that case. Besides, we need to test that in the absence of PIN_BASED_VMX_PREEMPTION_TIMER, we need to hide VM_EXIT_SAVE_VMX_PREEMPTION_TIMER, though this should not happen according to Intel SDM. >>> >>>> >>>> @@ -7578,9 +7579,14 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) >>>> (vmcs_config.pin_based_exec_ctrl | >>>> vmcs12->pin_based_vm_exec_control)); >>>> >>>> - if (vmcs12->pin_based_vm_exec_control & PIN_BASED_VMX_PREEMPTION_TIMER) >>>> - vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, >>>> - vmcs12->vmx_preemption_timer_value); >>>> + if (vmcs12->pin_based_vm_exec_control & PIN_BASED_VMX_PREEMPTION_TIMER) { >>>> + if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_VMX_PREEMPTION_TIMER) >>>> + vmcs12->vmx_preemption_timer_value = >>>> + vmcs_read32(VMX_PREEMPTION_TIMER_VALUE); >>>> + else >>>> + vmcs_write32(VMX_PREEMPTION_TIMER_VALUE, >>>> + vmcs12->vmx_preemption_timer_value); >>>> + } >>> >>> This is not correct. We still need to set the vmcs to >>> vmx_preemption_timer_value. The difference is that, on exit from L2, >>> vmx_preemption_timer_value has to be updated according to the saved >>> hardware state. The corresponding code is missing in your patch so far. >>> >>>> >>>> /* >>>> * Whether page-faults are trapped is determined by a combination of >>>> @@ -7690,7 +7696,11 @@ static void prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12) >>>> * we should use its exit controls. Note that VM_EXIT_LOAD_IA32_EFER >>>> * bits are further modified by vmx_set_efer() below. >>>> */ >>>> - vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); >>>> + if (vmcs12->pin_based_vm_exec_control & PIN_BASED_VMX_PREEMPTION_TIMER) >>>> + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl | >>>> + VM_EXIT_SAVE_VMX_PREEMPTION_TIMER); >>>> + else >>>> + vmcs_write32(VM_EXIT_CONTROLS, vmcs_config.vmexit_ctrl); >>> >>> Let's prepare the value for VM_EXIT_CONTROLS in a local variable first, >>> then write it to the vmcs. >>> >>>> >>>> /* vmcs12's VM_ENTRY_LOAD_IA32_EFER and VM_ENTRY_IA32E_MODE are >>>> * emulated by vmx_set_efer(), below. >>>> @@ -7912,6 +7922,16 @@ static int nested_vmx_run(struct kvm_vcpu *vcpu, bool launch) >>>> } >>>> >>>> /* >>>> + * If L2 support PIN_BASED_VMX_PREEMPTION_TIMER, L0 must support >>>> + * VM_EXIT_SAVE_VMX_PREEMPTION_TIMER. >>>> + */ >>>> + if ((vmcs12->pin_based_vm_exec_control & PIN_BASED_VMX_PREEMPTION_TIMER) && >>>> + !(nested_vmx_exit_ctls_high & VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)) { >>>> + nested_vmx_failValid(vcpu, VMXERR_ENTRY_INVALID_CONTROL_FIELD); >>>> + return 1; >>>> + } >>> >>> Nope, the guest is free to run the preemption timer without saving on >>> exits. It may have a valid use case for this, e.g. that it will always >>> reprogram it on entry. >> Here "!(nested_vmx_exit_ctls_high & >> VM_EXIT_SAVE_VMX_PREEMPTION_TIMER)" is used to detect if hardware >> support "save preemption timer" feature, which means if L2 supports >> pinbased vmx preemption timer, host must support "save preemption >> timer" feature. > > Sorry, parsed the code incorrectly. > >> Though nested_vmx_exit_ctls_* is used for nested env, >> but it can also used to reflect the host's feature. Here is what I >> discuss with you yesterday, and we can also get the feature via >> "rdmsr" here to avoid the confusion. > > Yes. The point is that we will not even expose > PIN_BASED_VMX_PREEMPTION_TIMER if VM_EXIT_SAVE_VMX_PREEMPTION_TIMER is > missing. If the guest then requests the former, it simply sets an > invalid pin-based control value which we already catch and report. So > this hunk becomes redundant. Yep, if we check it when setting nested_vmx_*_ctls_high this hunk is useless. Besides, see comments above. Arthur > > Jan > > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html