Re: [PATCH v2 14/15] drm/amdgpu: Use mmu_range_notifier instead of hmm_mirror

Jason Gunthorpe <jgg@xxxxxxxxxxxx> · Fri, 1 Nov 2019 19:51:28 +0000

On Fri, Nov 01, 2019 at 07:45:22PM +0000, Yang, Philip wrote:

> > This must be done inside the notifier_lock, after checking
> > mmu_range_read_retry(), all handling of the struct page must be
> > structured like that.
> > 
> Below change will fix this, then driver will call mmu_range_read_retry 
> second time using same range->notifier_seq to check if range is 
> invalidated inside amdgpu_cs_submit, this looks ok for me.

Lets defer this to some patch trying to fix it, I find it hard to
follow..

> @@ -868,6 +869,13 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo 
> *bo, struct page **pages)
>                  goto out_free_pfns;
>          }
> 
> +       mutex_lock(&adev->notifier_lock);
> +
> +       if (mmu_range_read_retry(&bo->notifier, range->notifier_seq)) {
> +               mutex_unlock(&adev->notifier_lock);
> +               goto retry;
> +       }
> +
>          for (i = 0; i < ttm->num_pages; i++) {
>                  pages[i] = hmm_device_entry_to_page(range, range->pfns[i]);
>                  if (unlikely(!pages[i])) {
> @@ -875,10 +883,12 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo 
> *bo, struct page **pages)
>                                 i, range->pfns[i]);
>                          r = -ENOMEM;
> 
> +                       mutex_unlock(&adev->notifier_lock);
>                          goto out_free_pfns;
>                  }
>          }

Well, maybe? 

The question now is what happens to 'pages' ? With this arrangment the
driver cannot touch 'pages' without also again going under the lock
and checking retry. 

If it doesn't touch it, then lets just move this device_entry_to_page
to a more appropriate place?

I'd prefer it if the driver could be structured in the normal way,
with a clear locked region where the page list is handled..

Jason