Hi, Christian, On Thu, 2021-10-07 at 16:57 +0200, Christian König wrote: > Am 07.10.21 um 15:26 schrieb Thomas Hellström: > > While the range notifier is executing, we have the write-side mmu > > interval > > seqlock, and mmu_interval_read_retry() is always returning true, > > which means that if amdgpu_cs_submit grabs the notifier lock during > > the > > fence wait, it will retry anyway when checking the userptr validity > > and block when retrying in mmu_interval_read_begin(). > > (See the extensive comments in mmu_interval_read_begin()) > > > > Hence we can release the notifier lock before the fence wait and > > avoid > > a device-wide command submission block during invalidation. > > First of all I'm not convinced that this works and second blocking > the > CS while an MMU invalidation is underway is completely intentional. > > In other words when for example fork() is called in parallel with a > CS > the CS should be blocked until the invalidation caused by the fork() > operation is completed and *NOT* risk that the CS succeeds and adds > another dependency to the MMU invalidation. The point is that command submission can't succeed since while the notifer runs, we have the write-side seqlock. It's released when the notifier ends, and we can rely on that. The following will happen: Thread1 Thread 2 enter_cs(); enter_invalidation(); notifier_lock() update_seqno(); notifier_unlock(); notifier_lock(); fence_wait(); validate_userptr(); (invalid_seqno) notifier_unlock(); (retry) mmu_interval_read_begin(); (blocks) fence_signal(); (notifier ends) mn_itree_inv_end(); (unblocks, new seqno); enter_cs(); (succeeds) So the difference is that you block only CS that affect that particular userptr, not CS across the entire device. So for the sake of discussion, is there a particular situation where you see a race that might happen here? Thanks, Thomas > > Christian. > > > > > Cc: Christian König <christian.koenig@xxxxxxx> > > Signed-off-by: Thomas Hellström <thomas.hellstrom@xxxxxxxxxxxxxxx> > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c > > index 4b153daf283d..d3d340a6129c 100644 > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c > > @@ -75,9 +75,10 @@ static bool amdgpu_mn_invalidate_gfx(struct > > mmu_interval_notifier *mni, > > > > mmu_interval_set_seq(mni, cur_seq); > > > > + mutex_unlock(&adev->notifier_lock); > > + > > r = dma_resv_wait_timeout(bo->tbo.base.resv, true, false, > > MAX_SCHEDULE_TIMEOUT); > > - mutex_unlock(&adev->notifier_lock); > > if (r <= 0) > > DRM_ERROR("(%ld) failed to wait for user bo\n", r); > > return true; >