Re: [PATCH v4 18/30] KVM: x86/mmu: Zap only TDP MMU leafs in kvm_zap_gfn_range()

Sean Christopherson <seanjc@xxxxxxxxxx> · Fri, 4 Mar 2022 16:11:04 +0000

On Fri, Mar 04, 2022, Mingwei Zhang wrote:
> On Thu, Mar 03, 2022, Paolo Bonzini wrote:
> > diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
> > index f3939ce4a115..c71debdbc732 100644
> > --- a/arch/x86/kvm/mmu/tdp_mmu.c
> > +++ b/arch/x86/kvm/mmu/tdp_mmu.c
> > @@ -834,10 +834,8 @@ bool kvm_tdp_mmu_zap_sp(struct kvm *kvm, struct kvm_mmu_page *sp)
> >  }
> >  
> >  /*
> > - * Tears down the mappings for the range of gfns, [start, end), and frees the
> > - * non-root pages mapping GFNs strictly within that range. Returns true if
> > - * SPTEs have been cleared and a TLB flush is needed before releasing the
> > - * MMU lock.
> > + * Zap leafs SPTEs for the range of gfns, [start, end). Returns true if SPTEs
> > + * have been cleared and a TLB flush is needed before releasing the MMU lock.
> 
> I think the original code does not _over_ zapping. But the new version
> does.

No, the new version doesn't overzap.

> Will that have some side effects? In particular, if the range is
> within a huge page (or HugeTLB page of various sizes), then we choose to
> zap it even if it is more than the range.

The old version did that too.  KVM _must_ zap a hugepage that overlaps the range,
otherwise the guest would be able to access memory that has been freed/moved.  If
the operation has unmapped a subset of a hugepage, KVM needs to zap and rebuild
the portions that are still valid using smaller pages.

> Regardless of side effect, I think we probably should mention that in
> the comments?
> > -		/*
> > -		 * If this is a non-last-level SPTE that covers a larger range
> > -		 * than should be zapped, continue, and zap the mappings at a
> > -		 * lower level, except when zapping all SPTEs.
> > -		 */
> > -		if (!zap_all &&
> > -		    (iter.gfn < start ||
> > -		     iter.gfn + KVM_PAGES_PER_HPAGE(iter.level) > end) &&
> > +		if (!is_shadow_present_pte(iter.old_spte) ||
> >  		    !is_last_spte(iter.old_spte, iter.level))

It's hard to see in the diff, but the key is the "!is_last_spte()" check.  The
check before was skipping non-leaf, a.k.a. shadow pages, if they weren't in the
range.  The new version _always_ skips shadow pages.  Hugepages will always
return true for is_last_spte() and will never be skipped.