[PATCH 7/7] KVM: x86/mmu: Recheck SPTE points to a PT during huge page recovery

David Matlack <dmatlack@xxxxxxxxxx> · Mon, 5 Aug 2024 16:31:14 -0700

Recheck the iter.old_spte still points to a page table when recovering
huge pages. Since mmu_lock is held for read and tdp_iter_step_up()
re-reads iter.sptep, it's possible the SPTE was zapped or recovered by
another CPU in between stepping down and back up.

This could avoids a useless cmpxchg (and possibly a remote TLB flush) if
another CPU is recovering huge SPTEs in parallel (e.g. the NX huge page
recovery worker, or vCPUs taking faults on the huge page region).

This also makes it clear that tdp_iter_step_up() re-reads the SPTE and
thus can see a different value, which is not immediately obvious when
reading the code.

Signed-off-by: David Matlack <dmatlack@xxxxxxxxxx>
---
 arch/x86/kvm/mmu/tdp_mmu.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 07d5363c9db7..bdc7fd476721 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1619,6 +1619,17 @@ static void recover_huge_pages_range(struct kvm *kvm,
 		while (max_mapping_level > iter.level)
 			tdp_iter_step_up(&iter);
 
+		/*
+		 * Re-check that iter.old_spte still points to a page table.
+		 * Since mmu_lock is held for read and tdp_iter_step_up()
+		 * re-reads iter.sptep, it's possible the SPTE was zapped or
+		 * recovered by another CPU in between stepping down and
+		 * stepping back up.
+		 */
+		if (!is_shadow_present_pte(iter.old_spte) ||
+		    is_last_spte(iter.old_spte, iter.level))
+			continue;
+
 		if (!tdp_mmu_set_spte_atomic(kvm, &iter, huge_spte))
 			flush = true;
 
-- 
2.46.0.rc2.264.g509ed76dc8-goog