The TDP MMU has a performance regression compared to the legacy MMU when CR0 changes often. This was reported for the grsecurity kernel, which uses CR0.WP to implement kernel W^X. In that case, each change to CR0.WP unloads the MMU and causes a lot of unnecessary work. When running nested, this can even cause the L1 to hardly make progress, as the L0 hypervisor it is overwhelmed by the amount of MMU work that is needed. Initially, my plan for this was to pull kvm_mmu_unload from kvm_mmu_reset_context into kvm_init_mmu. Therefore I started by separating the CPU setup (CR0/CR4/EFER, SMM, guest mode, etc.) from the shadow page table format. Right now the "MMU role" is a messy mix of the two and, whenever something is different between the MMU and the CPU, it is stored as an extra field in struct kvm_mmu; for extra bonus complication, sometimes the same thing is stored in both the role and an extra field. The aim was to keep kvm_mmu_unload only if the MMU role changed, and drop it if the CPU role changed. I even posted that cleanup, but it occurred to me later that even a conditional kvm_mmu_unload in kvm_init_mmu would be overkill. kvm_mmu_unload is only needed in the rare cases where a TLB flush is needed (e.g. CR0.PG changing from 1 to 0) or where the guest page table interpretation changes in way not captured by the role (that is, CPUID changes). But the implementation of fast PGD switching is subtle and requires a call to kvm_mmu_new_pgd (and therefore knowing the new MMU role) before kvm_init_mmu, therefore kvm_mmu_reset_context chickens and drops all the roots. Therefore, the meat of this series is a reorganization of fast PGD switching; it makes it possible to call kvm_mmu_new_pgd *after* the MMU has been set up, just using the MMU role instead of kvm_mmu_calc_root_page_role. Patches 1 to 3 are bugfixes found while working on the series. Patches 4 to 5 add more sanity checks that triggered a lot during development. Patches 6 and 7 are related cleanups. In particular patch 7 makes the cache lookup code a bit more pleasant. Patches 8 to 9 rework the fast PGD switching. Patches 10 and 11 are cleanups enabled by the rework, and the only survivors of the CPU role patchset. Finally, patch 12 optimizes kvm_mmu_reset_context. Paolo Paolo Bonzini (12): KVM: x86: host-initiated EFER.LME write affects the MMU KVM: MMU: move MMU role accessors to header KVM: x86: do not deliver asynchronous page faults if CR0.PG=0 KVM: MMU: WARN if PAE roots linger after kvm_mmu_unload KVM: MMU: avoid NULL-pointer dereference on page freeing bugs KVM: MMU: rename kvm_mmu_reload KVM: x86: use struct kvm_mmu_root_info for mmu->root KVM: MMU: do not consult levels when freeing roots KVM: MMU: look for a cached PGD when going from 32-bit to 64-bit KVM: MMU: load new PGD after the shadow MMU is initialized KVM: MMU: remove kvm_mmu_calc_root_page_role KVM: x86: do not unload MMU roots on all role changes arch/x86/include/asm/kvm_host.h | 3 +- arch/x86/kvm/mmu.h | 28 +++- arch/x86/kvm/mmu/mmu.c | 253 ++++++++++++++++---------------- arch/x86/kvm/mmu/mmu_audit.c | 4 +- arch/x86/kvm/mmu/paging_tmpl.h | 2 +- arch/x86/kvm/mmu/tdp_mmu.c | 2 +- arch/x86/kvm/mmu/tdp_mmu.h | 2 +- arch/x86/kvm/svm/nested.c | 6 +- arch/x86/kvm/vmx/nested.c | 8 +- arch/x86/kvm/vmx/vmx.c | 2 +- arch/x86/kvm/x86.c | 39 +++-- 11 files changed, 190 insertions(+), 159 deletions(-) -- 2.31.1