Recheck the iter.old_spte still points to a page table when recovering huge pages. Since mmu_lock is held for read and tdp_iter_step_up() re-reads iter.sptep, it's possible the SPTE was zapped or recovered by another CPU in between stepping down and back up. This could avoids a useless cmpxchg (and possibly a remote TLB flush) if another CPU is recovering huge SPTEs in parallel (e.g. the NX huge page recovery worker, or vCPUs taking faults on the huge page region). This also makes it clear that tdp_iter_step_up() re-reads the SPTE and thus can see a different value, which is not immediately obvious when reading the code. Signed-off-by: David Matlack <dmatlack@xxxxxxxxxx> --- arch/x86/kvm/mmu/tdp_mmu.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c index 07d5363c9db7..bdc7fd476721 100644 --- a/arch/x86/kvm/mmu/tdp_mmu.c +++ b/arch/x86/kvm/mmu/tdp_mmu.c @@ -1619,6 +1619,17 @@ static void recover_huge_pages_range(struct kvm *kvm, while (max_mapping_level > iter.level) tdp_iter_step_up(&iter); + /* + * Re-check that iter.old_spte still points to a page table. + * Since mmu_lock is held for read and tdp_iter_step_up() + * re-reads iter.sptep, it's possible the SPTE was zapped or + * recovered by another CPU in between stepping down and + * stepping back up. + */ + if (!is_shadow_present_pte(iter.old_spte) || + is_last_spte(iter.old_spte, iter.level)) + continue; + if (!tdp_mmu_set_spte_atomic(kvm, &iter, huge_spte)) flush = true; -- 2.46.0.rc2.264.g509ed76dc8-goog