Re: [PATCH 2/2] kvm: ppc: booke: check range page invalidation progress on page setup

Scott Wood <scottwood@xxxxxxxxxxxxx> · Fri, 9 Aug 2013 20:15:02 -0500

On Wed, 2013-08-07 at 15:33 +0530, Bharat Bhushan wrote:
> When the MM code is invalidating a range of pages, it calls the KVM
> kvm_mmu_notifier_invalidate_range_start() notifier function, which calls
> kvm_unmap_hva_range(), which arranges to flush all the TLBs for guest pages.
> However, the Linux PTEs for the range being flushed are still valid at
> that point.  We are not supposed to establish any new references to pages
> in the range until the ...range_end() notifier gets called.
> The PPC-specific KVM code doesn't get any explicit notification of that;
> instead, we are supposed to use mmu_notifier_retry() to test whether we
> are or have been inside a range flush notifier pair while we have been
> referencing a page.
> 
> This patch calls the mmu_notifier_retry() while mapping the guest
> page to ensure we are not referencing a page when in range invalidation.
> 
> This call is inside a region locked with kvm->mmu_lock, which is the
> same lock that is called by the KVM MMU notifier functions, thus
> ensuring that no new notification can proceed while we are in the
> locked region.
> 
> Signed-off-by: Bharat Bhushan <bharat.bhushan@xxxxxxxxxxxxx>
> ---
>  arch/powerpc/kvm/e500_mmu_host.c |   19 +++++++++++++++++--
>  1 files changed, 17 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/kvm/e500_mmu_host.c b/arch/powerpc/kvm/e500_mmu_host.c
> index ff6dd66..ae4eaf6 100644
> --- a/arch/powerpc/kvm/e500_mmu_host.c
> +++ b/arch/powerpc/kvm/e500_mmu_host.c
> @@ -329,8 +329,14 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
>  	int tsize = BOOK3E_PAGESZ_4K;
>  	unsigned long tsize_pages = 0;
>  	pte_t *ptep;
> -	int wimg = 0;
> +	int wimg = 0, ret = 0;
>  	pgd_t *pgdir;
> +	unsigned long mmu_seq;
> +	struct kvm *kvm = vcpu_e500->vcpu.kvm;
> +
> +	/* used to check for invalidations in progress */
> +	mmu_seq = kvm->mmu_notifier_seq;
> +	smp_rmb();
>  
>  	/*
>  	 * Translate guest physical to true physical, acquiring
> @@ -458,6 +464,13 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
>  				(long)gfn, pfn);
>  		return -EINVAL;
>  	}
> +
> +	spin_lock(&kvm->mmu_lock);
> +	if (mmu_notifier_retry(kvm, mmu_seq)) {
> +		ret = -EAGAIN;
> +		goto out;
> +	}
> +
>  	kvmppc_e500_ref_setup(ref, gtlbe, pfn, wimg);
>  
>  	kvmppc_e500_setup_stlbe(&vcpu_e500->vcpu, gtlbe, tsize,
> @@ -466,10 +479,12 @@ static inline int kvmppc_e500_shadow_map(struct kvmppc_vcpu_e500 *vcpu_e500,
>  	/* Clear i-cache for new pages */
>  	kvmppc_mmu_flush_icache(pfn);
>  
> +out:
> +	spin_unlock(&kvm->mmu_lock);
>  	/* Drop refcount on page, so that mmu notifiers can clear it */
>  	kvm_release_pfn_clean(pfn);
>  
> -	return 0;
> +	return ret;
>  }

Acked-by: Scott Wood <scottwood@xxxxxxxxxxxxx> since it's currently the
standard KVM approach, though I'm not happy about the busy-waiting
aspect of it.  What if we preempted the thread responsible for
decrementing mmu_notifier_count?  What if we did so being a SCHED_FIFO
task of higher priority than the decrementing thread?

-Scott

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html