On Thu, Feb 17, 2022, Paolo Bonzini wrote: > Whenever KVM knows the page role flags have changed, it needs to drop > the current MMU root and possibly load one from the prev_roots cache. > Currently it is papering over some overly simplistic code by just > dropping _all_ roots, so that the root will be reloaded by > kvm_mmu_reload, but this has bad performance for the TDP MMU > (which drops the whole of the page tables when freeing a root, > without the performance safety net of a hash table). > > To do this, KVM needs to do a more kvm_mmu_update_root call from > kvm_mmu_reset_context. Introduce a new request bit so that the call > can be delayed until after a possible KVM_REQ_MMU_RELOAD, which would > kill all hopes of finding a cached PGD. > > Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> > --- Please no. I really, really do not want to add yet another deferred-load in the nested virtualization paths. As Jim pointed out[1], KVM_REQ_GET_NESTED_STATE_PAGES should never have been merged. And on that point, I've no idea how this new request will interact with KVM_REQ_GET_NESTED_STATE_PAGE. It may be a complete non-issue, but I'd honestly rather not have to spend the brain power. And I still do not like the approach of converting kvm_mmu_reset_context() wholesale to not doing kvm_mmu_unload(). There are currently eight kvm_mmu_reset_context() calls: 1. nested_vmx_restore_host_state() - Only for a missed VM-Entry => VM-Fail consistency check, not at all a performance concern. 2. kvm_mmu_after_set_cpuid() - Still needs to unload. Not a perf concern. 3. kvm_vcpu_reset() - Relevant only to INIT. Not a perf concern, but could be converted manually to a different path without too much fuss. 4+5. enter_smm() / kvm_smm_changed() - IMO, not a perf concern, but again could be converted manually if anyone cares. 6. set_efer() - Silly corner case that basically requires host userspace abuse of KVM APIs. Not a perf concern. 7+8. kvm_post_set_cr0/4() - These are the ones we really care about, and they can be handled quite trivially, and can even share much of the logic with kvm_set_cr3(). I strongly prefer that we take a more conservative approach and fix 7+8, and then tackle 1, 3, and 4+5 separately if someone cares enough about those flows to avoid dropping roots. Regarding KVM_REQ_MMU_RELOAD, that mess mostly goes away with my series to replace that with KVM_REQ_MMU_FREE_OBSOLETE_ROOTS. Obsolete TDP MMU roots will never get a cache hit because the obsolete root will have an "invalid" role. And if we care about optimizing this with respect to a memslot (highly unlikely), then we could add an MMU generation check in the cache lookup. I was planning on posting that series as soon as this one is queued, but I'm more than happy to speculatively send a refreshed version that applies on top of this series. [1] https://lore.kernel.org/all/CALMp9eT2cP7kdptoP3=acJX+5_Wg6MXNwoDh42pfb21-wdXvJg@xxxxxxxxxxxxxx [2] https://lore.kernel.org/all/20211209060552.2956723-1-seanjc@xxxxxxxxxx