Re: [PATCH 0/4]: KVM: nVMX: VPID optimizations

Liran Alon <liran.alon@xxxxxxxxxx> · Thu, 31 May 2018 07:28:45 -0700 (PDT)

----- pbonzini@xxxxxxxxxx wrote:

> On 31/05/2018 14:52, Liran Alon wrote:
> > 
> > ----- pbonzini@xxxxxxxxxx wrote:
> > 
> >> On 23/05/2018 16:44, Sean Christopherson wrote:
> >>> Unloading the MMU on nested entry/exit doesn't seem deliberate,
> >>> e.g. why bother with VPID handling in prepare_vmcs02() if KVM
> >>> intends to unconditionally flush?  I think figuring out how to
> >>> avoid unloading the MMU in those cases will resolve the issue
> >>> of the TLB being flushed on every switch between L1 and L2,
> >>> though I get the feeling that that will mean doing a holistic
> >>> analysis of the (nested) MMU handling.
> >>
> >> My plan there was just to add a third kvm_mmu struct, which (as a 
> >> start) requires getting rid of all references to vcpu->arch.mmu in
> 
> >> arch/x86/kvm/.
> >>
> >> Paolo
> > 
> > I'm planning to a submit a series to handle this issue.
> > However, I'm not sure I got your plan. Can you elaborate?
> > How do you vision adding a third kvm_mmu struct will help resolve
> this issue?
> 
> I would like to avoid the kvm_mmu_reset_context on nested
> vmentry/vmexit.  Hopefully everything flows from that...
> 
> Paolo

I understand that but there are some aspects to consider here.

Switching between EPT01 and EPT02 must currently involve setting root_hpa to INVALID such that
vcpu_enter_guest() -> kvm_mmu_reload() -> kvm_mmu_load() -> vmx_set_cr3() will be called to actually
switch the EPTP.

In addition, because EPT02 is shadowing EPT12, there is a need to sync EPT02 to EPT12
before each load of EPT02 into EPTP. Thus, a VMEntry operation is similar to loading a new
CR3 value on standard shadow MMU. In contrast, a VMExit operation doesn't need to sync anything
as EPT01 is in direct-mode. This is why VMEntry indeed calls kvm_mmu_unload() which 
calls mmu_free_roots(), exactly the same as kvm_set_cr3() calls kvm_mmu_new_cr3() which
calls mmu_free_roots().

When running with VPID & EPT enabled, TLB entries are tagged with both VPID and EPTP.
Therefore, we should not flush TLB when switching between EPTPs. However, currently vmx_set_cr3()
always flush TLB.
When digging to why this is done, we will note the following:
1) Before VPID existed, every VMEntry/VMExit cause a full TLB flush. Therefore, vmx_set_cr3() didn't
contain a vmx_flush_tlb().
2) On commit which enabled VPID usage in KVM (2384d2b32640 ("KVM: VMX: Enable Virtual Processor
Identification (VPID)")), a vmx_flush_tlb() was added to vmx_set_cr3(). In addition, a call to
vpid_sync_vcpu_all() was added to vmx_vcpu_load() when switching a loaded VMCS on physical CPU.

To me, (2) seems to be not optimized because:
(a) Any shadow MMU sync/zap operation should request a KVM_REQ_TLB_FLUSH. In addition, any
shadow CR3 change should also result in a shadow MMU sync which will request a KVM_REQ_TLB_FLUSH.
Thus, no need to rely on flush that happens as a result of vmx_set_cr3() called because root_hpa is INVALID.
(b) When switching a loaded VMCS, we need to INVVPID single-context only in case new VMCS VPID was already
used by another VMCS which was run on this CPU. However, because vmx_vpid_bitmap is global, we can
just make sure to INVVPID single-context a VPID on all CPUs when a vCPU is destroyed and a call to
free_vpid() is made.
(c) Similarly, because EPTP are globally unique, INVEPT should be used today only in case MMU requested
a KVM_REQ_TLB_FLUSH as a result of EPT entries changes or because a EPT was freed (zap) which
should also result in KVM_REQ_TLB_FLUSH. Thus, we should also not need to INVEPT on vmx_set_cr3().

Therefore, I think the real question is why vmx_flush_tlb() is done in vmx_set_cr3() to begin with.
If it would not do so, TLB wouldn't be flushed when switching between EPT01 and EPT02 and vice-versa.
Even if kvm_mmu_unload() is called (similar to as done when switching CR3 values on standard shadow MMU).

Thoughts?

-Liran