Re: [PATCH v2 14/15] drm/amdgpu: Use mmu_range_notifier instead of hmm_mirror

Jason Gunthorpe <jgg@xxxxxxxxxxxx> · Fri, 1 Nov 2019 18:21:06 +0000

On Fri, Nov 01, 2019 at 02:44:51PM +0000, Yang, Philip wrote:
> @@ -854,12 +853,20 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
>  		r = -EPERM;
>  		goto out_unlock;
>  	}
> +	up_read(&mm->mmap_sem);
> +	timeout = jiffies + msecs_to_jiffies(HMM_RANGE_DEFAULT_TIMEOUT);
> +
> +retry:
> +	range->notifier_seq = mmu_range_read_begin(&bo->notifier);
>  
> +	down_read(&mm->mmap_sem);
>  	r = hmm_range_fault(range, 0);
>  	up_read(&mm->mmap_sem);
> -
> -	if (unlikely(r < 0))
> +	if (unlikely(r <= 0)) {
> +		if ((r == 0 || r == -EBUSY) && !time_after(jiffies, timeout))
> +			goto retry;
>  		goto out_free_pfns;
> +	}

I was reflecting on why this suddently became necessary, and I think
what might be happening is that hmm_range_fault() is trigging
invalidations as it runs (ie it is faulting in pages or something) and
that in turn causes the mrn to need retry.

The hmm version of this had a bug where a full
invalidate_range_start/end pair would not trigger retry, so this this
didn't happen.

This is unfortunate as the retry is unnecessary, but at this time I
can't think of a good way to separate an ignorable synchronous
invalidation caused by hmm_range_fault from an async one that cannot
be ignored..

A basic fix would be to not update the mrq seq in the notifier if
the invalidate is triggered by hmm_range_fault, but that seems
difficult to determine..

Any thoughts Jerome?

Jason