Precisely track (via kvm_mmu_page) if a non-huge page is being forced and use that info to avoid unnecessarily forcing smaller page sizes in disallowed_hugepage_adjust(). KVM incorrectly assumes that the NX huge page mitigation is the only scenario where KVM will create a non-leaf page instead of a huge page. As a result, if the original source of huge page incompatibility goes away, the NX mitigation is enabled, and KVM encounters an present shadow page when attempting to install a huge page, KVM will force a smaller page regardless of whether or not a smaller page is actually necessary to satisfy the NX huge page mitigation. Unnecessarily forcing small pages can result in degraded guest performance, especially on larger VMs. The bug was originally discovered when testing dirty log performance, as KVM would leave small pages lying around when zapping collapsible SPTEs. That case was indadvertantly fixed by commit 5ba7c4c6d1c7 ("KVM: x86/MMU: Zap non-leaf SPTEs when disabling dirty logging"), but other scenarios are still affected, e.g. KVM will not rebuild a huge page if the mmu_notifier zaps a range of PTEs because the primary MMU is creating a huge page. v3: - Bug the VM if KVM attempts to double account a shadow page that disallows a NX huge page. [David] - Split the rename to separate patch. [Paolo] - Rename more NX huge page variables/functions. [David] - Combine and tweak the comments about enforcing the NX huge page mitigation for non-paging MMUs. [Paolo, David] - Call out that the shadow MMU holds mmu_lock for write and doesn't need to manual handle memory ordering when accounting NX huge pages. [David] - Add a smp_rmb() when unlinking shadow pages in the TDP MMU. - Rename spte_to_sp() to spte_to_child_sp(). [David] - Collect reviews. [David] - Tweak the changelog for the final patch to call out that precise accounting addresses real world performance bugs. [Paolo] - Reword the changelog for the patch to (almost) always tag disallowed NX huge pages, and call out that it doesn't fix the TDP MMU. [David] v2: Rebase, tweak a changelog accordingly. v1: https://lore.kernel.org/all/20220409003847.819686-1-seanjc@xxxxxxxxxx Mingwei Zhang (1): KVM: x86/mmu: explicitly check nx_hugepage in disallowed_hugepage_adjust() Sean Christopherson (7): KVM: x86/mmu: Bug the VM if KVM attempts to double count an NX huge page KVM: x86/mmu: Tag disallowed NX huge pages even if they're not tracked KVM: x86/mmu: Rename NX huge pages fields/functions for consistency KVM: x86/mmu: Properly account NX huge page workaround for nonpaging MMUs KVM: x86/mmu: Set disallowed_nx_huge_page in TDP MMU before setting SPTE KVM: x86/mmu: Track the number of TDP MMU pages, but not the actual pages KVM: x86/mmu: Add helper to convert SPTE value to its shadow page arch/x86/include/asm/kvm_host.h | 19 ++--- arch/x86/kvm/mmu/mmu.c | 123 +++++++++++++++++++++----------- arch/x86/kvm/mmu/mmu_internal.h | 33 ++++----- arch/x86/kvm/mmu/paging_tmpl.h | 6 +- arch/x86/kvm/mmu/spte.c | 12 ++++ arch/x86/kvm/mmu/spte.h | 17 +++++ arch/x86/kvm/mmu/tdp_mmu.c | 59 ++++++++++----- arch/x86/kvm/mmu/tdp_mmu.h | 2 + 8 files changed, 178 insertions(+), 93 deletions(-) base-commit: 93472b79715378a2386598d6632c654a2223267b -- 2.37.1.559.g78731f0fdb-goog