On Wed, 2018-05-23 at 00:59 -0700, Liran Alon wrote: > ----- liran.alon@xxxxxxxxxx wrote: > > > > > ----- jmattson@xxxxxxxxxx wrote: > > > > > > > > While we're on the subject, is there any need for L0 to allocate a > > > vpid02 in the common case, where nested EPT is enabled? > > > > > > Per section 28.3.2 of the SDM, volume 3, when EPT is enabled, > > > combined > > > mappings in the TLB are tagged by {VPID, PCID, EP4TA}. With nested > > > EPT, vmcs02 and vmcs01 do not share an EP4TA. Therefore, I think it > > > suffices to simply copy the VPID field from vmcs12 to vmcs02 in > > this > > > > > > case. > > Good point. I agree. > > This will trivially allow physical CPU to save multiple TLB entries > > populated by L2 with same EP4TA but different VPIDs. > > > > I do think however that this should be done on a separate patch series > > on top of this one. > > I will check if I can easily create that series of patches. > > > > Thanks, > > -Liran > After some initial investigation, it seems current TLB management in KVM is worse than I thought. > > By looking at vmx_set_cr3() (which is the only place which write to VMCS EPT_POINTER), > it seems that every load of a new EPT pointer will vmx_flush_tlb(vcpu, true); > In case of running with enable_ept, this will flush all TLB entries tagged with new loaded EPTP. > > This means that on nVMX scenario where vmcs12 uses EPT, the TLB effectively gets flushed > every time you switch between L1 and L2... > > In addition, even in non-nVMX scenarios, in the CPU over-commit case, if a physical CPU switches > between running a vCPU of one VM to a vCPU of another VM, it will keep flushing TLB entries of both VMs > even though they are tagged with separate EPTP. nVMX aside, KVM's overarching design is to load a new MMU root, i.e. EPTP, only when necessary. Switching VMCSes should not invoke vmx_set_cr3() regardless of what prompted the VMCS switch, e.g. kvm_mmu_reload() only invokes set_cr3() if the MMU root is invalid, and vmx_vcpu_put() doesn't unload the MMU. As for nVMX, both nested entry and exit explicitly reset the MMU via nested_vmx_load_cr3(), and nested entry also unloads the MMU when nested EPT is active, via nested_ept_init_mmu_context(). Unloading the MMU on nested entry/exit doesn't seem deliberate, e.g. why bother with VPID handling in prepare_vmcs02() if KVM intends to unconditionally flush? I think figuring out how to avoid unloading the MMU in those cases will resolve the issue of the TLB being flushed on every switch between L1 and L2, though I get the feeling that that will mean doing a holistic analysis of the (nested) MMU handling. > Due to the above, I think I will create a series that will first fix this issue and then perform > the optimization suggested by Jim here. > > Regards, > -Liran