I'm going to be on vacation next week, but I'll address these concerns when I return. If time permits, I'll send out a patch this week that just addresses the issue of doing a VMPTRLD before putting the VMCS on the loaded VMCSs list. On Wed, Jul 27, 2016 at 2:53 PM, Bandan Das <bsd@xxxxxxxxxx> wrote: > Radim Krčmář <rkrcmar@xxxxxxxxxx> writes: > >> 2016-07-22 11:25-0700, Jim Mattson: >>> If a shadow VMCS is referenced by the VMCS link pointer in the >>> current VMCS, then VM-entry makes the shadow VMCS active on the >>> current processor. No VMCS should ever be active on more than one >>> processor. If a VMCS is to be migrated from one processor to >>> another, software should execute a VMCLEAR for the VMCS on the >>> first processor before the VMCS is made active on the second >>> processor. >>> >>> We already keep track of ordinary VMCSs that are made active by >>> VMPTRLD. Extend that tracking to handle shadow VMCSs that are >>> made active by VM-entry to their parents. >>> >>> Signed-off-by: Jim Mattson <jmattson@xxxxxxxxxx> >>> --- >>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c >>> @@ -2113,6 +2113,15 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu) >>> new.control) != old.control); >>> } >>> >>> +static void record_loaded_vmcs(struct loaded_vmcs *loaded_vmcs, int cpu) >>> +{ >>> + if (loaded_vmcs->vmcs) { >>> + list_add(&loaded_vmcs->loaded_vmcss_on_cpu_link, >>> + &per_cpu(loaded_vmcss_on_cpu, cpu)); >>> + loaded_vmcs->cpu = cpu; >>> + } >>> +} >>> + >>> /* >>> * Switches to specified vcpu, until a matching vcpu_put(), but assumes >>> * vcpu mutex is already taken. >>> @@ -2124,15 +2133,13 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) >>> >>> if (!vmm_exclusive) >>> kvm_cpu_vmxon(phys_addr); >>> - else if (vmx->loaded_vmcs->cpu != cpu) >>> + else if (vmx->loaded_vmcs->cpu != cpu) { >>> loaded_vmcs_clear(vmx->loaded_vmcs); >>> + if (vmx->nested.current_shadow_vmcs.vmcs) >>> + loaded_vmcs_clear(&vmx->nested.current_shadow_vmcs); >> >> loaded_vmcs_clear() uses expensive IPI, so would want to do both in one >> call in future patches as they are always active on the same CPU. > > Maybe just move the check for an active shadow vmcs to loaded_vmcs_clear > and clear it there unconditionally. > >> Another possible complication is marking current_shadow_vmcs as active >> on a cpu only after a successful vmlaunch. >> (We don't seem to ever vmptrld shadow vmcs explicitly.) >> >>> + } >> >> Incorrect whitespace for indentation. >> >>> >>> if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) { >>> - per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs; >>> - vmcs_load(vmx->loaded_vmcs->vmcs); >>> - } >>> - >>> - if (vmx->loaded_vmcs->cpu != cpu) { >> >> This condition is nice for performance because a non-current vmcs is >> usually already active on the same CPU, so we skip all the code below. >> >> (This is the only thing that has to be fixed as it regresses non-nested, >> the rest are mostly ideas for simplifications.) > > I think he wanted to make sure to call vmcs_load after the call > to crash_disable_local_vmclear() but that should be possible without > removing the check. > > Bandan > >>> struct desc_ptr *gdt = this_cpu_ptr(&host_gdt); >>> unsigned long sysenter_esp; >>> >>> @@ -2147,11 +2154,15 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) >>> */ >>> smp_rmb(); >>> >>> - list_add(&vmx->loaded_vmcs->loaded_vmcss_on_cpu_link, >>> - &per_cpu(loaded_vmcss_on_cpu, cpu)); >>> + record_loaded_vmcs(vmx->loaded_vmcs, cpu); >> >> Adding and an element to a list multiple times seems invalid, which the >> condition was also guarding. >> >>> + record_loaded_vmcs(&vmx->nested.current_shadow_vmcs, cpu); >> >> current_shadow_vmcs is always active on the same cpu as loaded_vmcs ... >> I think we could skip the list and just clear current_shadow_vmcs when >> clearing its loaded_vmcs. >> >>> crash_enable_local_vmclear(cpu); >>> local_irq_enable(); >>> >>> + per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs; >>> + vmcs_load(vmx->loaded_vmcs->vmcs); >>> + >>> /* >>> * Linux uses per-cpu TSS and GDT, so set these when switching >>> * processors. >>> @@ -2161,8 +2172,6 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu) >>> >>> rdmsrl(MSR_IA32_SYSENTER_ESP, sysenter_esp); >>> vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */ >>> - >>> - vmx->loaded_vmcs->cpu = cpu; >>> } >>> >>> /* Setup TSC multiplier */ >>> @@ -6812,6 +6821,34 @@ static int nested_vmx_check_vmptr(struct kvm_vcpu *vcpu, int exit_reason, >>> return 0; >>> } >>> >>> +static int setup_shadow_vmcs(struct vcpu_vmx *vmx) >>> +{ >>> + struct vmcs *shadow_vmcs; >>> + int cpu; >>> + >>> + shadow_vmcs = alloc_vmcs(); >>> + if (!shadow_vmcs) >>> + return -ENOMEM; >>> + >>> + /* mark vmcs as shadow */ >>> + shadow_vmcs->revision_id |= (1u << 31); >>> + /* init shadow vmcs */ >>> + vmx->nested.current_shadow_vmcs.vmcs = shadow_vmcs; >>> + loaded_vmcs_init(&vmx->nested.current_shadow_vmcs); >>> + >>> + cpu = get_cpu(); >>> + local_irq_disable(); >>> + crash_disable_local_vmclear(cpu); >>> + >>> + record_loaded_vmcs(&vmx->nested.current_shadow_vmcs, cpu); >> >> This could be avoided if we assumed that shadow vmcs is always active on >> the same vcpu. The assumption looks rock-solid, because shadow vmcs is >> activated on vmlaunch and its linking vmcs must be active (and current) >> on the same CPU. >> >>> + >>> + crash_enable_local_vmclear(cpu); >>> + local_irq_enable(); >>> + put_cpu(); >>> + >>> + return 0; >>> +} >>> + >>> /* >>> * Emulate the VMXON instruction. >>> * Currently, we just remember that VMX is active, and do not save or even >>> @@ -6867,14 +6903,9 @@ static int handle_vmon(struct kvm_vcpu *vcpu) >>> } >>> >>> if (enable_shadow_vmcs) { >>> - shadow_vmcs = alloc_vmcs(); >>> - if (!shadow_vmcs) >>> - return -ENOMEM; >>> - /* mark vmcs as shadow */ >>> - shadow_vmcs->revision_id |= (1u << 31); >>> - /* init shadow vmcs */ >>> - vmcs_clear(shadow_vmcs); >>> - vmx->nested.current_shadow_vmcs = shadow_vmcs; >>> + int ret = setup_shadow_vmcs(vmx); >>> + if (ret < 0) >>> + return ret; >>> } >>> >>> INIT_LIST_HEAD(&(vmx->nested.vmcs02_pool)); -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html