On 27/06/2018 23:59, Junaid Shahid wrote: > Changes since v2: > - CR3_PCID_INVD is replaced by X86_CR3_PCID_NOFLUSH > - kvm_mmu_calc_root_page_role() and friends are no longer public > - Simplified the race condition example in mmu_need_write_protect() > - Added smp_load_acquire()s in kvm_mmu_sync_roots() > - Ignored non-canonical addresses in vmx_flush_tlb_gva() > - A couple of minor cleanups > > Changes since v1: > - Renamed the flags returned by set_spte > - Split up a couple of changes into separate patches and refactored some > other patches > - .set_cr3() handlers never flush TLB rather than taking that as parameter > - Generalized lockless CR3 switching to work acroos different MMU modes > - Implemented lockless CR3/EPTP switching for nested VMX L1<->L2 switches > - Added an LRU cache containing multiple fast-switchable roots instead > of limiting it to only the immediately previous one. > > The performance of shadow paging is severely degraded in some workloads > when the guest kernel is using KPTI. This is primarily due to the vastly > increased number of CR3 switches that result from KPTI. > > This patch series implements various optimizations to reduce some of this > overhead. Compared to the baseline, this results in a reduction from > ~16m12s to ~4m44s for a 4-VCPU kernel compile benchmark and from ~25m5s to > ~14m50s for a 1-VCPU kernel compile benchmark. Great! The CPUID microbenchmark with 16 vCPUs is down by about 25%. The remaining overhead comes from the get_user_pages calls in nested_get_vmcs12_pages, which contend on pmd_lock. We should be able to cache those using the MMU notifier, similar to how the APIC access page is already handled. Thanks, Paolo > Junaid Shahid (18): > kvm: x86: Make sync_page() flush remote TLBs once only > kvm: x86: Avoid taking MMU lock in kvm_mmu_sync_roots if no sync is > needed > kvm: x86: Add fast CR3 switch code path > kvm: x86: Introduce kvm_mmu_calc_root_page_role() > kvm: x86: Introduce KVM_REQ_LOAD_CR3 > kvm: x86: Add support for fast CR3 switch across different MMU modes > kvm: x86: Support resetting the MMU context without resetting roots > kvm: x86: Use fast CR3 switch for nested VMX > kvm: x86: Add ability to skip TLB flush when switching CR3 > kvm: x86: Propagate guest PCIDs to host PCIDs > kvm: vmx: Support INVPCID in shadow paging mode > kvm: x86: Skip TLB flush on fast CR3 switch when indicated by guest > kvm: x86: Add a root_hpa parameter to kvm_mmu->invlpg() > kvm: x86: Support selectively freeing either current or previous MMU > root > kvm: x86: Skip shadow page resync on CR3 switch when indicated by > guest > kvm: x86: Flush only affected TLB entries in kvm_mmu_invlpg* > kvm: x86: Add multi-entry LRU cache for previous CR3s > kvm: x86: Remove CR3_PCID_INVD flag > > arch/x86/include/asm/kvm_host.h | 32 ++- > arch/x86/kvm/emulate.c | 2 +- > arch/x86/kvm/mmu.c | 468 ++++++++++++++++++++++++++------ > arch/x86/kvm/mmu.h | 24 +- > arch/x86/kvm/paging_tmpl.h | 18 +- > arch/x86/kvm/svm.c | 12 +- > arch/x86/kvm/vmx.c | 154 ++++++++++- > arch/x86/kvm/x86.c | 18 +- > virt/kvm/kvm_main.c | 14 +- > 9 files changed, 630 insertions(+), 112 deletions(-) >