Re: [PATCH v2 16/18] KVM: x86: introduce KVM_REQ_MMU_UPDATE_ROOT

Paolo Bonzini <pbonzini@xxxxxxxxxx> · Sat, 19 Feb 2022 08:54:46 +0100

On 2/18/22 22:45, Sean Christopherson wrote:
On Thu, Feb 17, 2022, Paolo Bonzini wrote:
Whenever KVM knows the page role flags have changed, it needs to drop
the current MMU root and possibly load one from the prev_roots cache.
Currently it is papering over some overly simplistic code by just
dropping _all_ roots, so that the root will be reloaded by
kvm_mmu_reload, but this has bad performance for the TDP MMU
(which drops the whole of the page tables when freeing a root,
without the performance safety net of a hash table).

To do this, KVM needs to do a more kvm_mmu_update_root call from
kvm_mmu_reset_context.  Introduce a new request bit so that the call
can be delayed until after a possible KVM_REQ_MMU_RELOAD, which would
kill all hopes of finding a cached PGD.

Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx>
---

Please no.

I really, really do not want to add yet another deferred-load in the nested
virtualization paths.

This is not a deferred load, is it?  It's only kvm_mmu_new_pgd that is 
deferred, but the PDPTR load is not.

I think I should first merge patches 1-13, then revisit the root_role 
series (which only depends on the fast_pgd_switch and caching changes), 
and then finally get back to this final part.  The reason is that 
root_role is what enables the stale-root check that you wanted; and it's 
easier to think about loading the guest PGD post-kvm_init_mmu if I can 
show you the direction I'd like to have in general, and not leave things 
half-done.

(Patch 17 is also independent and perhaps fixing a case of premature 
optimization, so I'm inclined to merge it as well).

As Jim pointed out[1], KVM_REQ_GET_NESTED_STATE_PAGES should
never have been merged. And on that point, I've no idea how this new request will
interact with KVM_REQ_GET_NESTED_STATE_PAGE.  It may be a complete non-issue, but
I'd honestly rather not have to spend the brain power.

Fair enough on the interaction, but I still think 
KVM_REQ_GET_NESTED_STATE_PAGES is a good idea.  I don't think KVM should 
access guest memory outside KVM_RUN, though there may be cases (possibly 
some PV MSRs, if I had to guess) where it does.

And I still do not like the approach of converting kvm_mmu_reset_context() wholesale
to not doing kvm_mmu_unload().  There are currently eight kvm_mmu_reset_context() calls:

   1.   nested_vmx_restore_host_state() - Only for a missed VM-Entry => VM-Fail
        consistency check, not at all a performance concern.

   2.   kvm_mmu_after_set_cpuid() - Still needs to unload.  Not a perf concern.

   3.   kvm_vcpu_reset() - Relevant only to INIT.  Not a perf concern, but could be
        converted manually to a different path without too much fuss.

   4+5. enter_smm() / kvm_smm_changed() - IMO, not a perf concern, but again could
        be converted manually if anyone cares.

   6.   set_efer() - Silly corner case that basically requires host userspace abuse
        of KVM APIs.  Not a perf concern.

   7+8. kvm_post_set_cr0/4() - These are the ones we really care about, and they
        can be handled quite trivially, and can even share much of the logic with
        kvm_set_cr3().

I strongly prefer that we take a more conservative approach and fix 7+8, and then
tackle 1, 3, and 4+5 separately if someone cares enough about those flows to avoid
dropping roots.

The thing is, I want to get rid of kvm_mmu_reset_context() altogether. 
I dislike the fact that it kills the roots but still keeps them in the 
hash table, thus relying on separate syncing to avoid future bugs.  It's 
very unintuitive what is "reset" and what isn't.

Regarding KVM_REQ_MMU_RELOAD, that mess mostly goes away with my series to replace
that with KVM_REQ_MMU_FREE_OBSOLETE_ROOTS.  Obsolete TDP MMU roots will never get
a cache hit because the obsolete root will have an "invalid" role.  And if we care
about optimizing this with respect to a memslot (highly unlikely), then we could
add an MMU generation check in the cache lookup.  I was planning on posting that
series as soon as this one is queued, but I'm more than happy to speculatively send
a refreshed version that applies on top of this series.

Yes, please send a version on top of patches 1-13.  That can be reviewed 
and committed in parallel with the root_role changes.

Paolo

[1] https://lore.kernel.org/all/CALMp9eT2cP7kdptoP3=acJX+5_Wg6MXNwoDh42pfb21-wdXvJg@xxxxxxxxxxxxxx
[2] https://lore.kernel.org/all/20211209060552.2956723-1-seanjc@xxxxxxxxxx