Re: [PATCH 0/4]: KVM: nVMX: VPID optimizations

Sean Christopherson <sean.j.christopherson@xxxxxxxxx> · Wed, 23 May 2018 07:44:26 -0700

On Wed, 2018-05-23 at 00:59 -0700, Liran Alon wrote:
> ----- liran.alon@xxxxxxxxxx wrote:
> 
> > 
> > ----- jmattson@xxxxxxxxxx wrote:
> > 
> > > 
> > > While we're on the subject, is there any need for L0 to allocate a
> > > vpid02 in the common case, where nested EPT is enabled?
> > > 
> > > Per section 28.3.2 of the SDM, volume 3, when EPT is enabled,
> > > combined
> > > mappings in the TLB are tagged by {VPID, PCID, EP4TA}. With nested
> > > EPT, vmcs02 and vmcs01 do not share an EP4TA. Therefore, I think it
> > > suffices to simply copy the VPID field from vmcs12 to vmcs02 in
> > this
> > > 
> > > case.
> > Good point. I agree.
> > This will trivially allow physical CPU to save multiple TLB entries
> > populated by L2 with same EP4TA but different VPIDs.
> > 
> > I do think however that this should be done on a separate patch series
> > on top of this one.
> > I will check if I can easily create that series of patches.
> > 
> > Thanks,
> > -Liran
> After some initial investigation, it seems current TLB management in KVM is worse than I thought.
> 
> By looking at vmx_set_cr3() (which is the only place which write to VMCS EPT_POINTER),
> it seems that every load of a new EPT pointer will vmx_flush_tlb(vcpu, true);
> In case of running with enable_ept, this will flush all TLB entries tagged with new loaded EPTP.
> 
> This means that on nVMX scenario where vmcs12 uses EPT, the TLB effectively gets flushed
> every time you switch between L1 and L2...
> 
> In addition, even in non-nVMX scenarios, in the CPU over-commit case, if a physical CPU switches
> between running a vCPU of one VM to a vCPU of another VM, it will keep flushing TLB entries of both VMs
> even though they are tagged with separate EPTP.

nVMX aside, KVM's overarching design is to load a new MMU root,
i.e. EPTP, only when necessary.  Switching VMCSes should not
invoke vmx_set_cr3() regardless of what prompted the VMCS switch,
e.g. kvm_mmu_reload() only invokes set_cr3() if the MMU root is
invalid, and vmx_vcpu_put() doesn't unload the MMU.

As for nVMX, both nested entry and exit explicitly reset the MMU
via nested_vmx_load_cr3(), and nested entry also unloads the MMU
when nested EPT is active, via nested_ept_init_mmu_context().

Unloading the MMU on nested entry/exit doesn't seem deliberate,
e.g. why bother with VPID handling in prepare_vmcs02() if KVM
intends to unconditionally flush?  I think figuring out how to
avoid unloading the MMU in those cases will resolve the issue
of the TLB being flushed on every switch between L1 and L2,
though I get the feeling that that will mean doing a holistic
analysis of the (nested) MMU handling.

> Due to the above, I think I will create a series that will first fix this issue and then perform
> the optimization suggested by Jim here.
> 
> Regards,
> -Liran