On Tue, Oct 29, 2019 at 04:28:43PM +0000, Kuehling, Felix wrote: > On 2019-10-28 4:10 p.m., Jason Gunthorpe wrote: > > From: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > > > > find_vma() must be called under the mmap_sem, reorganize this code to > > do the vma check after entering the lock. > > > > Further, fix the unlocked use of struct task_struct's mm, instead use > > the mm from hmm_mirror which has an active mm_grab. Also the mm_grab > > must be converted to a mm_get before acquiring mmap_sem or calling > > find_vma(). > > > > Fixes: 66c45500bfdc ("drm/amdgpu: use new HMM APIs and helpers") > > Fixes: 0919195f2b0d ("drm/amdgpu: Enable amdgpu_ttm_tt_get_user_pages in worker threads") > > Cc: Alex Deucher <alexander.deucher@xxxxxxx> > > Cc: Christian König <christian.koenig@xxxxxxx> > > Cc: David (ChunMing) Zhou <David1.Zhou@xxxxxxx> > > Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx > > Signed-off-by: Jason Gunthorpe <jgg@xxxxxxxxxxxx> > > One question inline to confirm my understanding. Otherwise this patch is > > Reviewed-by: Felix Kuehling <Felix.Kuehling@xxxxxxx> Thanks > > - if (unlikely((gtt->userflags & AMDGPU_GEM_USERPTR_ANONONLY) && > > - vma->vm_file)) { > > - r = -EPERM; > > - goto out; > > - } > > + mm = mirror->hmm->mmu_notifier.mm; > > + if (!mmget_not_zero(mm)) /* Happens during process shutdown */ > > This works because mirror->hmm->mmu_notifier holds an mmgrab reference > to the mm? Yes, this makes sure the mm pointer remains valid > So the MM will not just go away, but if the mmget refcount is 0, it > means the mm is marked for destruction and shouldn't be used any > more. Not just marked for destruction, but that another thread is progressing or finished release(). The other detail here is that in general you can't get the mmap_sem without also having a mmget as exit_mmap() does not lock the mmap_sem in some places where it alters the datastructures. ie racing find_vma() with exit_mmap() is not allowed. This means we have to hold the mmget across the hmm_range_fault(), but we can drop the mmget and then test mmu_range_read_retry() under the driver lock. It will return true if the mmget refcount has gone to zero in the mean time. But I think this is probably a poor driver design, a driver should just hold the mmget() until it has completed establishing the shadow PTEs, as it is hard to see a reason not to.. Jason