Re: [PATCH v2] KVM: nVMX: Track active shadow VMCSs

Jim Mattson <jmattson@xxxxxxxxxx> · Thu, 28 Jul 2016 06:30:04 -0700

I'm going to be on vacation next week, but I'll address these concerns
when I return. If time permits, I'll send out a patch this week that
just addresses the issue of doing a VMPTRLD before putting the VMCS on
the loaded VMCSs list.

On Wed, Jul 27, 2016 at 2:53 PM, Bandan Das <bsd@xxxxxxxxxx> wrote:
> Radim Krčmář <rkrcmar@xxxxxxxxxx> writes:
>
>> 2016-07-22 11:25-0700, Jim Mattson:
>>> If a shadow VMCS is referenced by the VMCS link pointer in the
>>> current VMCS, then VM-entry makes the shadow VMCS active on the
>>> current processor. No VMCS should ever be active on more than one
>>> processor. If a VMCS is to be migrated from one processor to
>>> another, software should execute a VMCLEAR for the VMCS on the
>>> first processor before the VMCS is made active on the second
>>> processor.
>>>
>>> We already keep track of ordinary VMCSs that are made active by
>>> VMPTRLD. Extend that tracking to handle shadow VMCSs that are
>>> made active by VM-entry to their parents.
>>>
>>> Signed-off-by: Jim Mattson <jmattson@xxxxxxxxxx>
>>> ---
>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>> @@ -2113,6 +2113,15 @@ static void vmx_vcpu_pi_load(struct kvm_vcpu *vcpu, int cpu)
>>>                      new.control) != old.control);
>>>  }
>>>
>>> +static void record_loaded_vmcs(struct loaded_vmcs *loaded_vmcs, int cpu)
>>> +{
>>> +    if (loaded_vmcs->vmcs) {
>>> +            list_add(&loaded_vmcs->loaded_vmcss_on_cpu_link,
>>> +                     &per_cpu(loaded_vmcss_on_cpu, cpu));
>>> +            loaded_vmcs->cpu = cpu;
>>> +    }
>>> +}
>>> +
>>>  /*
>>>   * Switches to specified vcpu, until a matching vcpu_put(), but assumes
>>>   * vcpu mutex is already taken.
>>> @@ -2124,15 +2133,13 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>>
>>>      if (!vmm_exclusive)
>>>              kvm_cpu_vmxon(phys_addr);
>>> -    else if (vmx->loaded_vmcs->cpu != cpu)
>>> +    else if (vmx->loaded_vmcs->cpu != cpu) {
>>>              loaded_vmcs_clear(vmx->loaded_vmcs);
>>> +            if (vmx->nested.current_shadow_vmcs.vmcs)
>>> +                    loaded_vmcs_clear(&vmx->nested.current_shadow_vmcs);
>>
>> loaded_vmcs_clear() uses expensive IPI, so would want to do both in one
>> call in future patches as they are always active on the same CPU.
>
> Maybe just move the check for an active shadow vmcs to loaded_vmcs_clear
> and clear it there unconditionally.
>
>> Another possible complication is marking current_shadow_vmcs as active
>> on a cpu only after a successful vmlaunch.
>> (We don't seem to ever vmptrld shadow vmcs explicitly.)
>>
>>> +        }
>>
>> Incorrect whitespace for indentation.
>>
>>>
>>>      if (per_cpu(current_vmcs, cpu) != vmx->loaded_vmcs->vmcs) {
>>> -            per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
>>> -            vmcs_load(vmx->loaded_vmcs->vmcs);
>>> -    }
>>> -
>>> -    if (vmx->loaded_vmcs->cpu != cpu) {
>>
>> This condition is nice for performance because a non-current vmcs is
>> usually already active on the same CPU, so we skip all the code below.
>>
>> (This is the only thing that has to be fixed as it regresses non-nested,
>>  the rest are mostly ideas for simplifications.)
>
> I think he wanted to make sure to call vmcs_load after the call
> to crash_disable_local_vmclear() but that should be possible without
> removing the check.
>
> Bandan
>
>>>              struct desc_ptr *gdt = this_cpu_ptr(&host_gdt);
>>>              unsigned long sysenter_esp;
>>>
>>> @@ -2147,11 +2154,15 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>>               */
>>>              smp_rmb();
>>>
>>> -            list_add(&vmx->loaded_vmcs->loaded_vmcss_on_cpu_link,
>>> -                     &per_cpu(loaded_vmcss_on_cpu, cpu));
>>> +            record_loaded_vmcs(vmx->loaded_vmcs, cpu);
>>
>> Adding and an element to a list multiple times seems invalid, which the
>> condition was also guarding.
>>
>>> +            record_loaded_vmcs(&vmx->nested.current_shadow_vmcs, cpu);
>>
>> current_shadow_vmcs is always active on the same cpu as loaded_vmcs ...
>> I think we could skip the list and just clear current_shadow_vmcs when
>> clearing its loaded_vmcs.
>>
>>>              crash_enable_local_vmclear(cpu);
>>>              local_irq_enable();
>>>
>>> +            per_cpu(current_vmcs, cpu) = vmx->loaded_vmcs->vmcs;
>>> +            vmcs_load(vmx->loaded_vmcs->vmcs);
>>> +
>>>              /*
>>>               * Linux uses per-cpu TSS and GDT, so set these when switching
>>>               * processors.
>>> @@ -2161,8 +2172,6 @@ static void vmx_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
>>>
>>>              rdmsrl(MSR_IA32_SYSENTER_ESP, sysenter_esp);
>>>              vmcs_writel(HOST_IA32_SYSENTER_ESP, sysenter_esp); /* 22.2.3 */
>>> -
>>> -            vmx->loaded_vmcs->cpu = cpu;
>>>      }
>>>
>>>      /* Setup TSC multiplier */
>>> @@ -6812,6 +6821,34 @@ static int nested_vmx_check_vmptr(struct kvm_vcpu *vcpu, int exit_reason,
>>>      return 0;
>>>  }
>>>
>>> +static int setup_shadow_vmcs(struct vcpu_vmx *vmx)
>>> +{
>>> +    struct vmcs *shadow_vmcs;
>>> +    int cpu;
>>> +
>>> +    shadow_vmcs = alloc_vmcs();
>>> +    if (!shadow_vmcs)
>>> +            return -ENOMEM;
>>> +
>>> +    /* mark vmcs as shadow */
>>> +    shadow_vmcs->revision_id |= (1u << 31);
>>> +    /* init shadow vmcs */
>>> +    vmx->nested.current_shadow_vmcs.vmcs = shadow_vmcs;
>>> +    loaded_vmcs_init(&vmx->nested.current_shadow_vmcs);
>>> +
>>> +    cpu = get_cpu();
>>> +    local_irq_disable();
>>> +    crash_disable_local_vmclear(cpu);
>>> +
>>> +    record_loaded_vmcs(&vmx->nested.current_shadow_vmcs, cpu);
>>
>> This could be avoided if we assumed that shadow vmcs is always active on
>> the same vcpu.  The assumption looks rock-solid, because shadow vmcs is
>> activated on vmlaunch and its linking vmcs must be active (and current)
>> on the same CPU.
>>
>>> +
>>> +    crash_enable_local_vmclear(cpu);
>>> +    local_irq_enable();
>>> +    put_cpu();
>>> +
>>> +    return 0;
>>> +}
>>> +
>>>  /*
>>>   * Emulate the VMXON instruction.
>>>   * Currently, we just remember that VMX is active, and do not save or even
>>> @@ -6867,14 +6903,9 @@ static int handle_vmon(struct kvm_vcpu *vcpu)
>>>      }
>>>
>>>      if (enable_shadow_vmcs) {
>>> -            shadow_vmcs = alloc_vmcs();
>>> -            if (!shadow_vmcs)
>>> -                    return -ENOMEM;
>>> -            /* mark vmcs as shadow */
>>> -            shadow_vmcs->revision_id |= (1u << 31);
>>> -            /* init shadow vmcs */
>>> -            vmcs_clear(shadow_vmcs);
>>> -            vmx->nested.current_shadow_vmcs = shadow_vmcs;
>>> +            int ret = setup_shadow_vmcs(vmx);
>>> +            if (ret < 0)
>>> +                    return ret;
>>>      }
>>>
>>>      INIT_LIST_HEAD(&(vmx->nested.vmcs02_pool));
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html