On Sat, 2022-02-19 at 08:54 +0100, Paolo Bonzini wrote: > On 2/18/22 22:45, Sean Christopherson wrote: > > On Thu, Feb 17, 2022, Paolo Bonzini wrote: > > > Whenever KVM knows the page role flags have changed, it needs to drop > > > the current MMU root and possibly load one from the prev_roots cache. > > > Currently it is papering over some overly simplistic code by just > > > dropping _all_ roots, so that the root will be reloaded by > > > kvm_mmu_reload, but this has bad performance for the TDP MMU > > > (which drops the whole of the page tables when freeing a root, > > > without the performance safety net of a hash table). > > > > > > To do this, KVM needs to do a more kvm_mmu_update_root call from > > > kvm_mmu_reset_context. Introduce a new request bit so that the call > > > can be delayed until after a possible KVM_REQ_MMU_RELOAD, which would > > > kill all hopes of finding a cached PGD. > > > > > > Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> > > > --- > > > > Please no. > > > > I really, really do not want to add yet another deferred-load in the nested > > virtualization paths. > > This is not a deferred load, is it? It's only kvm_mmu_new_pgd that is > deferred, but the PDPTR load is not. > > I think I should first merge patches 1-13, then revisit the root_role > series (which only depends on the fast_pgd_switch and caching changes), > and then finally get back to this final part. The reason is that > root_role is what enables the stale-root check that you wanted; and it's > easier to think about loading the guest PGD post-kvm_init_mmu if I can > show you the direction I'd like to have in general, and not leave things > half-done. > > (Patch 17 is also independent and perhaps fixing a case of premature > optimization, so I'm inclined to merge it as well). > > > As Jim pointed out[1], KVM_REQ_GET_NESTED_STATE_PAGES should > > never have been merged. And on that point, I've no idea how this new request will > > interact with KVM_REQ_GET_NESTED_STATE_PAGE. It may be a complete non-issue, but > > I'd honestly rather not have to spend the brain power. > > Fair enough on the interaction, but I still think > KVM_REQ_GET_NESTED_STATE_PAGES is a good idea. I don't think KVM should > access guest memory outside KVM_RUN, though there may be cases (possibly > some PV MSRs, if I had to guess) where it does. KVM_REQ_GET_NESTED_STATE_PAGES is a real source of bugs, and a burden to maintain, I fixed too many bugs in it, and it will only get worse with time, not to mention, that without any proper tests, we are bound to access guest memory on setting the nested state without anybody noticing. Best regards, Maxim Levitsky > > > And I still do not like the approach of converting kvm_mmu_reset_context() wholesale > > to not doing kvm_mmu_unload(). There are currently eight kvm_mmu_reset_context() calls: > > > > 1. nested_vmx_restore_host_state() - Only for a missed VM-Entry => VM-Fail > > consistency check, not at all a performance concern. > > > > 2. kvm_mmu_after_set_cpuid() - Still needs to unload. Not a perf concern. > > > > 3. kvm_vcpu_reset() - Relevant only to INIT. Not a perf concern, but could be > > converted manually to a different path without too much fuss. > > > > 4+5. enter_smm() / kvm_smm_changed() - IMO, not a perf concern, but again could > > be converted manually if anyone cares. > > > > 6. set_efer() - Silly corner case that basically requires host userspace abuse > > of KVM APIs. Not a perf concern. > > > > 7+8. kvm_post_set_cr0/4() - These are the ones we really care about, and they > > can be handled quite trivially, and can even share much of the logic with > > kvm_set_cr3(). > > > > I strongly prefer that we take a more conservative approach and fix 7+8, and then > > tackle 1, 3, and 4+5 separately if someone cares enough about those flows to avoid > > dropping roots. > > The thing is, I want to get rid of kvm_mmu_reset_context() altogether. > I dislike the fact that it kills the roots but still keeps them in the > hash table, thus relying on separate syncing to avoid future bugs. It's > very unintuitive what is "reset" and what isn't. > > > Regarding KVM_REQ_MMU_RELOAD, that mess mostly goes away with my series to replace > > that with KVM_REQ_MMU_FREE_OBSOLETE_ROOTS. Obsolete TDP MMU roots will never get > > a cache hit because the obsolete root will have an "invalid" role. And if we care > > about optimizing this with respect to a memslot (highly unlikely), then we could > > add an MMU generation check in the cache lookup. I was planning on posting that > > series as soon as this one is queued, but I'm more than happy to speculatively send > > a refreshed version that applies on top of this series. > > Yes, please send a version on top of patches 1-13. That can be reviewed > and committed in parallel with the root_role changes. > > Paolo > > > [1] https://lore.kernel.org/all/CALMp9eT2cP7kdptoP3=acJX+5_Wg6MXNwoDh42pfb21-wdXvJg@xxxxxxxxxxxxxx > > [2] https://lore.kernel.org/all/20211209060552.2956723-1-seanjc@xxxxxxxxxx