Re: [PATCH 0/4]: KVM: nVMX: VPID optimizations

Liran Alon <liran.alon@xxxxxxxxxx> · Wed, 23 May 2018 10:43:58 -0700 (PDT)

----- sean.j.christopherson@xxxxxxxxx wrote:

> On Wed, 2018-05-23 at 00:59 -0700, Liran Alon wrote:
> > ----- liran.alon@xxxxxxxxxx wrote:
> > 
> > > 
> > > ----- jmattson@xxxxxxxxxx wrote:
> > > 
> > > > 
> > > > While we're on the subject, is there any need for L0 to allocate
> a
> > > > vpid02 in the common case, where nested EPT is enabled?
> > > > 
> > > > Per section 28.3.2 of the SDM, volume 3, when EPT is enabled,
> > > > combined
> > > > mappings in the TLB are tagged by {VPID, PCID, EP4TA}. With
> nested
> > > > EPT, vmcs02 and vmcs01 do not share an EP4TA. Therefore, I think
> it
> > > > suffices to simply copy the VPID field from vmcs12 to vmcs02 in
> > > this
> > > > 
> > > > case.
> > > Good point. I agree.
> > > This will trivially allow physical CPU to save multiple TLB
> entries
> > > populated by L2 with same EP4TA but different VPIDs.
> > > 
> > > I do think however that this should be done on a separate patch
> series
> > > on top of this one.
> > > I will check if I can easily create that series of patches.
> > > 
> > > Thanks,
> > > -Liran
> > After some initial investigation, it seems current TLB management in
> KVM is worse than I thought.
> > 
> > By looking at vmx_set_cr3() (which is the only place which write to
> VMCS EPT_POINTER),
> > it seems that every load of a new EPT pointer will
> vmx_flush_tlb(vcpu, true);
> > In case of running with enable_ept, this will flush all TLB entries
> tagged with new loaded EPTP.
> > 
> > This means that on nVMX scenario where vmcs12 uses EPT, the TLB
> effectively gets flushed
> > every time you switch between L1 and L2...
> > 
> > In addition, even in non-nVMX scenarios, in the CPU over-commit
> case, if a physical CPU switches
> > between running a vCPU of one VM to a vCPU of another VM, it will
> keep flushing TLB entries of both VMs
> > even though they are tagged with separate EPTP.
> 
> nVMX aside, KVM's overarching design is to load a new MMU root,
> i.e. EPTP, only when necessary.  Switching VMCSes should not
> invoke vmx_set_cr3() regardless of what prompted the VMCS switch,
> e.g. kvm_mmu_reload() only invokes set_cr3() if the MMU root is
> invalid, and vmx_vcpu_put() doesn't unload the MMU.

Yeah you are right for the non-nVMX case.
I mistakenly overlooked at that part of my description.

> 
> As for nVMX, both nested entry and exit explicitly reset the MMU
> via nested_vmx_load_cr3(), and nested entry also unloads the MMU
> when nested EPT is active, via nested_ept_init_mmu_context().
> 
> Unloading the MMU on nested entry/exit doesn't seem deliberate,
> e.g. why bother with VPID handling in prepare_vmcs02() if KVM
> intends to unconditionally flush?  I think figuring out how to
> avoid unloading the MMU in those cases will resolve the issue
> of the TLB being flushed on every switch between L1 and L2,
> though I get the feeling that that will mean doing a holistic
> analysis of the (nested) MMU handling.

Yes. That is exactly my thoughts and what I planned to do.
I will create a series that will attempt to handle this.

> 
> > Due to the above, I think I will create a series that will first fix
> this issue and then perform
> > the optimization suggested by Jim here.
> > 
> > Regards,
> > -Liran