This series is a resurrection of the missing pieces of Paolo's previous attempt[1] to avoid needless MMU roots unloading. The performance gap between TDP and legacy MMU is still existent, especially noticeable under grsecurity which implements kernel W^X by toggling CR0.WP, which happens very frequently. Patches 1-13 and 17 of the old series had been merged, but, unfortunately, the remaining parts never saw a v3. I therefore took care of these, took Sean's feedback into account[2] and simplified the whole approach to just handle the case we care most about explicitly. Patch 1 is a v3 of [3], addressing Sean's feedback. Patch 2 is specifically useful for grsecurity, as handle_cr() is by far *the* top vmexit reason. Patch 3 is the most important one, as it skips unloading the MMU roots for CR0.WP toggling. While patches 1 and 2 bring small performance improvements already, the big gains comes from patch 3. However, as the performance impact is huge (and my knowledge about KVM internals is little) it might very well be, I did miss an important aspect. But KVM tests ran fine, so did manual ones I did that explicitly poke around CR0.WP toggling corner cases. Please give it a look! This series builds on top of kvm.git/queue, namely commit de60733246ff ("Merge branch 'kvm-hw-enable-refactor' into HEAD"). Thanks, Mathias [1] https://lore.kernel.org/kvm/20220217210340.312449-1-pbonzini@xxxxxxxxxx/ [2] https://lore.kernel.org/kvm/YhATewkkO%2Fl4P9UN@xxxxxxxxxx/ [3] https://lore.kernel.org/kvm/YhAB1d1%2FnQbx6yvk@xxxxxxxxxx/ Mathias Krause (2): KVM: VMX: avoid retpoline call for control register caused exits KVM: x86: do not unload MMU roots when only toggling CR0.WP Paolo Bonzini (1): KVM: x86/mmu: avoid indirect call for get_cr3 arch/x86/include/asm/kvm_host.h | 2 +- arch/x86/kvm/mmu/mmu.c | 38 +++++++++++++++++++++------------ arch/x86/kvm/mmu/paging_tmpl.h | 2 +- arch/x86/kvm/smm.c | 4 ++-- arch/x86/kvm/vmx/nested.c | 2 +- arch/x86/kvm/vmx/vmx.c | 2 ++ arch/x86/kvm/x86.c | 28 ++++++++++++++++-------- 7 files changed, 50 insertions(+), 28 deletions(-) -- 2.39.0