Re: [PATCH 05/14] Revert "KVM: MMU: drop kvm_mmu_zap_mmio_sptes"

Sean Christopherson <sean.j.christopherson@xxxxxxxxx> · Fri, 1 Feb 2019 10:17:39 -0800

On Wed, Jan 30, 2019 at 05:43:53PM +0100, Paolo Bonzini wrote:
> On 10/01/19 19:06, Sean Christopherson wrote:
> > Revert back to a dedicated (and slower) mechanism for handling the
> > scenario where all MMIO shadow PTEs need to be zapped due to overflowing
> > the MMIO generation number.  The MMIO generation scenario is almost
> > literally a one-in-a-million occurrence, i.e. is not a performance
> > sensitive scenario.
> > 
> > Restoring kvm_mmu_zap_mmio_sptes() leaves VM teardown as the only user
> > of kvm_mmu_invalidate_zap_all_pages() and paves the way for removing
> > the fast invalidate mechanism altogether.
> > 
> > This reverts commit a8eca9dcc656a405a28ffba43f3d86a1ff0eb331.
> > 
> > Cc: Xiao Guangrong <guangrong.xiao@xxxxxxxxx>
> > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx>
> > ---

[...]

> > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> > index bb56beb166e4..df41193aba38 100644
> > --- a/arch/x86/kvm/mmu.c
> > +++ b/arch/x86/kvm/mmu.c
> > @@ -386,6 +386,8 @@ static void mark_mmio_spte(struct kvm_vcpu *vcpu, u64 *sptep, u64 gfn,
> >  	mask |= (gpa & shadow_nonpresent_or_rsvd_mask)
> >  		<< shadow_nonpresent_or_rsvd_mask_len;
> >  
> > +	page_header(__pa(sptep))->mmio_cached = true;
> > +
> >  	trace_mark_mmio_spte(sptep, gfn, access, gen);
> >  	mmu_spte_set(sptep, mask);
> >  }
> > @@ -5932,6 +5934,24 @@ static bool kvm_has_zapped_obsolete_pages(struct kvm *kvm)
> >  	return unlikely(!list_empty_careful(&kvm->arch.zapped_obsolete_pages));
> >  }
> >  
> > +static void kvm_mmu_zap_mmio_sptes(struct kvm *kvm)
> > +{
> > +	struct kvm_mmu_page *sp, *node;
> > +	LIST_HEAD(invalid_list);
> > +
> > +	spin_lock(&kvm->mmu_lock);
> > +restart:
> > +	list_for_each_entry_safe(sp, node, &kvm->arch.active_mmu_pages, link) {
> > +		if (!sp->mmio_cached)
> > +			continue;
> > +		if (kvm_mmu_prepare_zap_page(kvm, sp, &invalid_list))
> > +			goto restart;
> 
> This is potentially quadratic.  If the "- 150" hack is removed, I'd
> rather have a zap-all instead.

Why is the walk restarted after zapping children?  Commit 0738541396be
("KVM: MMU: awareness of new kvm_mmu_zap_page behaviour") added the
restart and simply states:

   kvm_mmu_zap_page will soon zap the unsynced children of a page.
   Restart list walk in such case.

Which isn't exactly illuminating, but it does imply that the MMIO case
should never need to restart since MMIO sptes can't have children.

The "unsynced children" behavior was added by commit 4731d4c7a077 ("KVM:
MMU: out of sync shadow core"), and at that time kvm_mmu_zap_page() only
returned the number of zapped unsynced children.

Commit 54a4f0239f2e ("KVM: MMU: make kvm_mmu_zap_page() return the
number of pages it actually freed") extended the return value to include
the page itself, which makes sense for the general case, but it means
restarting the walk on a non-zero return value is at least partially
wrong, if not completely unnecessary.

> I understand it is a huge headache to squash the change at this point of
> the series precisely, so it's okay to have a patch on top at the end.
> 
> Paolo
> 
> > +	}
> > +
> > +	kvm_mmu_commit_zap_page(kvm, &invalid_list);
> > +	spin_unlock(&kvm->mmu_lock);
> > +}
> > +
> >  void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, struct kvm_memslots *slots)
> >  {
> >  	/*
> > @@ -5940,7 +5960,7 @@ void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, struct kvm_memslots *slots)
> >  	 */
> >  	if (unlikely((slots->generation & MMIO_GEN_MASK) == 0)) {
> >  		kvm_debug_ratelimited("kvm: zapping shadow pages for mmio generation wraparound\n");
> > -		kvm_mmu_invalidate_zap_all_pages(kvm);
> > +		kvm_mmu_zap_mmio_sptes(kvm);
> >  	}
> >  }
> >  
> > 
>