On Fri, Nov 01, 2019 at 07:45:22PM +0000, Yang, Philip wrote: > > This must be done inside the notifier_lock, after checking > > mmu_range_read_retry(), all handling of the struct page must be > > structured like that. > > > Below change will fix this, then driver will call mmu_range_read_retry > second time using same range->notifier_seq to check if range is > invalidated inside amdgpu_cs_submit, this looks ok for me. Lets defer this to some patch trying to fix it, I find it hard to follow.. > @@ -868,6 +869,13 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo > *bo, struct page **pages) > goto out_free_pfns; > } > > + mutex_lock(&adev->notifier_lock); > + > + if (mmu_range_read_retry(&bo->notifier, range->notifier_seq)) { > + mutex_unlock(&adev->notifier_lock); > + goto retry; > + } > + > for (i = 0; i < ttm->num_pages; i++) { > pages[i] = hmm_device_entry_to_page(range, range->pfns[i]); > if (unlikely(!pages[i])) { > @@ -875,10 +883,12 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo > *bo, struct page **pages) > i, range->pfns[i]); > r = -ENOMEM; > > + mutex_unlock(&adev->notifier_lock); > goto out_free_pfns; > } > } Well, maybe? The question now is what happens to 'pages' ? With this arrangment the driver cannot touch 'pages' without also again going under the lock and checking retry. If it doesn't touch it, then lets just move this device_entry_to_page to a more appropriate place? I'd prefer it if the driver could be structured in the normal way, with a clear locked region where the page list is handled.. Jason