Sorry, resend patch, the one in previous email missed couple of lines duo to copy/paste. On 2019-11-01 3:45 p.m., Yang, Philip wrote: > > > On 2019-11-01 1:42 p.m., Jason Gunthorpe wrote: >> On Fri, Nov 01, 2019 at 03:59:26PM +0000, Yang, Philip wrote: >>>> This test for range_blockable should be before mutex_lock, I can move >>>> it up >>>> >>> yes, thanks. >> >> Okay, I wrote it like this: >> >> if (mmu_notifier_range_blockable(range)) >> mutex_lock(&adev->notifier_lock); >> else if (!mutex_trylock(&adev->notifier_lock)) >> return false; >> >>>> Also, do you know if notifier_lock is held while calling >>>> amdgpu_ttm_tt_get_user_pages_done()? Can we add a 'lock assert held' >>>> to amdgpu_ttm_tt_get_user_pages_done()? >>> >>> gpu side hold notifier_lock but kfd side doesn't. kfd side doesn't check >>> amdgpu_ttm_tt_get_user_pages_done/mmu_range_read_retry return value but >>> check mem->invalid flag which is updated from invalidate callback. It >>> takes more time to change, I will come to another patch to fix it later. >> >> Ah.. confusing, OK, I'll let you sort that >> >>>> However, this is all pre-existing bugs, so I'm OK go ahead with this >>>> patch as modified. I advise AMD to make a followup patch .. >>>> >>> yes, I will. >> >> While you are here, this is also wrong: >> >> int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages) >> { >> down_read(&mm->mmap_sem); >> r = hmm_range_fault(range, 0); >> up_read(&mm->mmap_sem); >> if (unlikely(r <= 0)) { >> if ((r == 0 || r == -EBUSY) && !time_after(jiffies, timeout)) >> goto retry; >> goto out_free_pfns; >> } >> >> for (i = 0; i < ttm->num_pages; i++) { >> pages[i] = hmm_device_entry_to_page(range, range->pfns[i]); >> >> It is not allowed to read the results of hmm_range_fault() outside >> locking, and in particular, we can't convert to a struct page. >> >> This must be done inside the notifier_lock, after checking >> mmu_range_read_retry(), all handling of the struct page must be >> structured like that. >> > Below change will fix this, then driver will call mmu_range_read_retry > second time using same range->notifier_seq to check if range is > invalidated inside amdgpu_cs_submit, this looks ok for me. > @@ -797,6 +797,7 @@ static const uint64_t hmm_range_values[HMM_PFN_VALUE_MAX] = { */ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages) { + struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev); struct ttm_tt *ttm = bo->tbo.ttm; struct amdgpu_ttm_tt *gtt = (void *)ttm; unsigned long start = gtt->userptr; @@ -868,6 +869,13 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages) goto out_free_pfns; } + mutex_lock(&adev->notifier_lock); + + if (mmu_range_read_retry(&bo->notifier, range->notifier_seq)) { + mutex_unlock(&adev->notifier_lock); + goto retry; + } + for (i = 0; i < ttm->num_pages; i++) { pages[i] = hmm_device_entry_to_page(range, range->pfns[i]); if (unlikely(!pages[i])) { @@ -875,10 +883,12 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages) i, range->pfns[i]); r = -ENOMEM; + mutex_unlock(&adev->notifier_lock); goto out_free_pfns; } } + mutex_unlock(&adev->notifier_lock); gtt->range = range; mmput(mm); > > Philip > >>>>> @@ -997,10 +1004,18 @@ static void amdgpu_ttm_tt_unpin_userptr(struct ttm_tt *ttm) >>>>> sg_free_table(ttm->sg); >>>>> >>>>> #if IS_ENABLED(CONFIG_DRM_AMDGPU_USERPTR) >>>>> - if (gtt->range && >>>>> - ttm->pages[0] == hmm_device_entry_to_page(gtt->range, >>>>> - gtt->range->pfns[0])) >>>>> - WARN_ONCE(1, "Missing get_user_page_done\n"); >>>>> + if (gtt->range) { >>>>> + unsigned long i; >>>>> + >>>>> + for (i = 0; i < ttm->num_pages; i++) { >>>>> + if (ttm->pages[i] != >>>>> + hmm_device_entry_to_page(gtt->range, >>>>> + gtt->range->pfns[i])) >>>>> + break; >>>>> + } >>>>> + >>>>> + WARN((i == ttm->num_pages), "Missing get_user_page_done\n"); >>>>> + } >>>> >>>> Is this related/necessary? I can put it in another patch if it is just >>>> debugging improvement? Please advise >>>> >>> I see this WARN backtrace now, but I didn't see it before. This is >>> somehow related. >> >> Hm, might be instructive to learn what is going on.. >> >> Thanks, >> Jason >> > _______________________________________________ > amd-gfx mailing list > amd-gfx@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/amd-gfx >