----- pbonzini@xxxxxxxxxx wrote: > On 31/05/2018 14:52, Liran Alon wrote: > > > > ----- pbonzini@xxxxxxxxxx wrote: > > > >> On 23/05/2018 16:44, Sean Christopherson wrote: > >>> Unloading the MMU on nested entry/exit doesn't seem deliberate, > >>> e.g. why bother with VPID handling in prepare_vmcs02() if KVM > >>> intends to unconditionally flush? I think figuring out how to > >>> avoid unloading the MMU in those cases will resolve the issue > >>> of the TLB being flushed on every switch between L1 and L2, > >>> though I get the feeling that that will mean doing a holistic > >>> analysis of the (nested) MMU handling. > >> > >> My plan there was just to add a third kvm_mmu struct, which (as a > >> start) requires getting rid of all references to vcpu->arch.mmu in > > >> arch/x86/kvm/. > >> > >> Paolo > > > > I'm planning to a submit a series to handle this issue. > > However, I'm not sure I got your plan. Can you elaborate? > > How do you vision adding a third kvm_mmu struct will help resolve > this issue? > > I would like to avoid the kvm_mmu_reset_context on nested > vmentry/vmexit. Hopefully everything flows from that... > > Paolo I understand that but there are some aspects to consider here. Switching between EPT01 and EPT02 must currently involve setting root_hpa to INVALID such that vcpu_enter_guest() -> kvm_mmu_reload() -> kvm_mmu_load() -> vmx_set_cr3() will be called to actually switch the EPTP. In addition, because EPT02 is shadowing EPT12, there is a need to sync EPT02 to EPT12 before each load of EPT02 into EPTP. Thus, a VMEntry operation is similar to loading a new CR3 value on standard shadow MMU. In contrast, a VMExit operation doesn't need to sync anything as EPT01 is in direct-mode. This is why VMEntry indeed calls kvm_mmu_unload() which calls mmu_free_roots(), exactly the same as kvm_set_cr3() calls kvm_mmu_new_cr3() which calls mmu_free_roots(). When running with VPID & EPT enabled, TLB entries are tagged with both VPID and EPTP. Therefore, we should not flush TLB when switching between EPTPs. However, currently vmx_set_cr3() always flush TLB. When digging to why this is done, we will note the following: 1) Before VPID existed, every VMEntry/VMExit cause a full TLB flush. Therefore, vmx_set_cr3() didn't contain a vmx_flush_tlb(). 2) On commit which enabled VPID usage in KVM (2384d2b32640 ("KVM: VMX: Enable Virtual Processor Identification (VPID)")), a vmx_flush_tlb() was added to vmx_set_cr3(). In addition, a call to vpid_sync_vcpu_all() was added to vmx_vcpu_load() when switching a loaded VMCS on physical CPU. To me, (2) seems to be not optimized because: (a) Any shadow MMU sync/zap operation should request a KVM_REQ_TLB_FLUSH. In addition, any shadow CR3 change should also result in a shadow MMU sync which will request a KVM_REQ_TLB_FLUSH. Thus, no need to rely on flush that happens as a result of vmx_set_cr3() called because root_hpa is INVALID. (b) When switching a loaded VMCS, we need to INVVPID single-context only in case new VMCS VPID was already used by another VMCS which was run on this CPU. However, because vmx_vpid_bitmap is global, we can just make sure to INVVPID single-context a VPID on all CPUs when a vCPU is destroyed and a call to free_vpid() is made. (c) Similarly, because EPTP are globally unique, INVEPT should be used today only in case MMU requested a KVM_REQ_TLB_FLUSH as a result of EPT entries changes or because a EPT was freed (zap) which should also result in KVM_REQ_TLB_FLUSH. Thus, we should also not need to INVEPT on vmx_set_cr3(). Therefore, I think the real question is why vmx_flush_tlb() is done in vmx_set_cr3() to begin with. If it would not do so, TLB wouldn't be flushed when switching between EPT01 and EPT02 and vice-versa. Even if kvm_mmu_unload() is called (similar to as done when switching CR3 values on standard shadow MMU). Thoughts? -Liran