Hi, Christian. I'd appreciate if you could respond to the below, since it is a bit hard to try to design around a problem I don't believe exists, and come up with a good solution for that. In short. 1) To prefault userptr we have to exit the ww transaction anyway. 2) Any contended lock held at loop start is completely encapsulated in the ww transaction and can and will be unlocked when exiting it, so this patch doesn't introduce any additional problems for userptr handling AFAICT. 3) The need for a fully capable ww transaction helper moving forward. If we need a tool that also does userptr locking, then I think we need to separate that from the ww transaction tool and only pass the latter around to TTM. Thanks, Thomas On Wed, 2024-05-22 at 19:42 +0200, Thomas Hellström wrote: > On Wed, 2024-05-22 at 18:52 +0200, Christian König wrote: > > Am 22.05.24 um 16:32 schrieb Thomas Hellström: > > > On Wed, 2024-05-22 at 07:52 +0200, Christian König wrote: > > > > Am 21.05.24 um 09:16 schrieb Thomas Hellström: > > > > > If contention and backoff occurs during a drm_exec ww > > > > > transaction, > > > > > the contended lock is not locked again until the next orinary > > > > > attempt to lock a dma_resv lock. However, with the > > > > > introduction > > > > > of > > > > > drm_exec_trylock(), that doesn't work, since the locking of > > > > > the > > > > > contended lock needs to be a sleeping lock. Neither can we > > > > > ignore > > > > > locking the contended lock during a trylock since that would > > > > > violate > > > > > at least the ww_mutex annotations. > > > > > > > > > > So resolve this by actually locking the contended lock during > > > > > drm_exec_retry_on_contention(). However, this introduces a > > > > > new > > > > > point > > > > > of failure since locking the contended lock may return - > > > > > EINTR. > > > > > > > > > > Hence drm_exec_retry_on_contention() must take an error > > > > > parameter > > > > > and > > > > > also return a value indicating success. > > > > After thinking more about that I have to pretty clearly NAK > > > > this. > > > > > > > I thought we were beyond upfront NAKing in the first reply :/ > > > > Well my memory could fail me, but I mentioned concerns on this > > approach > > before. > > > > I was a bit annoyed seeing that again. But could as well be that my > > response never got out or that I'm mixing things up. > > I haven't seen it at least. Last discussion on this I saw was > here. I didn't see a follow-up on that. > > https://lore.kernel.org/dri-devel/953c157bf69df12d831a781f0f638d93717bb044.camel@xxxxxxxxxxxxxxx/ > > > > > > > > It's an intentional design decision to guarantee that at the > > > > start of > > > > the loop no object is locked. > > > > > > > > This is because Sima and I wanted to integrate userptr handling > > > > into > > > > drm_exec as well in the long term. > > > First I agree the interface looks worse with this patch. > > > But I thought generic userptr handling were going to end up as a > > > gpuvm > > > helper (without using GEM objects) as we've discussed previously. > > > > We might be talking past each other. That sounds like SVM, e.g. on > > demand paging. > > > > What I mean is pre-faulting during command submission like radeon, > > amdgpu and i915 do for the userptr handling. > > Yes, then we're talking about the same thing. > > We discussed in this thread here, started by Dave. > > https://lore.kernel.org/dri-devel/CAPM=9twPgn+fpbkig0Vhjt=cJdHQFbNH_Z=sRhSZwuvLKhavbA@xxxxxxxxxxxxxx/ > > I still think the right place is in drm_gpuvm for this sort of stuff. > And I think that's the concluding argument by Sima as well. > > In any case, If the planned drm_exec development is to be a full > execbuf helper, I think we need a capable sub-helper for ONLY the ww > transaction locking as well, with support for the various locking > primitives. In particular if we're going to be able to port i915 ww > transaction locking over. There are more uses of the ww locking > transacions than execbuf. > > > > > For that you need to re-start the whole handling similar to how you > > need > > to re-start for the mutex locking when you detect that the page > > array > > is > > stale, the difference is that you are not allowed to hold any resv > > locks > > while pre-faulting. > > > > That's why it is a requirement that the drm_exec loop starts > > without > > any > > locks held. > > But wouldn't you need an outer (userptr) loop and an inner > (ww_transaction) loop for this? Why would we want to re-validate > userptrs on -EDEADLKS? > > > > > > Anyway if still there would be helpers in drm_exec for some other > > > generic userptr solution, those need to be done before the > > > ww_acquire_ctx_init(). The contended locking here is done after, > > > so > > > I > > > can't really see how these would clash. > > > > Yes, that indeed was a problem. The ww_acquire_ctx_init() was > > intentionally moved into drm_exec_cleanup() to partially prevent > > that > > issue. > > > > I haven't fully figured out how to do handle everything exactly, > > but > > at > > least in principle it can be made work. With this change here it > > becomes > > impossible. > > > > > Still, If we need to come up with another solution, I think it's > > > fair > > > we clearly sort out why. > > > > > > > I think we should just document that drm_exec_trylock() can't > > > > be > > > > used > > > > to > > > > lock the first BO in the loop and explicitly WARN if that's the > > > > case. > > > Unfortunately that's not sufficient for the general use-case. If > > > we > > > want to keep the ttm_bo_vm approach of dropping the mmap lock > > > when > > > there is contention on the bo resv, we need to be able to trylock > > > on > > > first lock. > > > > Mhm, why exactly do we still have that dance in the first place? > > > > I mean we have sorted out the mmap() and dma_resv() locking order > > long > > ago. See dma_resv_lockdep() which is enforcing that. > > I explained that in my reply here: > > https://lore.kernel.org/dri-devel/953c157bf69df12d831a781f0f638d93717bb044.camel@xxxxxxxxxxxxxxx/ > > We shouldn't be holding the mmap lock when waiting for stuff. In > particular not while waiting for mutexes that may be blocked by gpu > activity. > > > > > > Also bo creation is using trylock but might be able to use > > > a sleeping lock there. But if that sleeping lock triggers an - > > > EDEADLK > > > (DEBUG_WW_MUTEX_SLOWPATH) we have the weird situation of > > > referencing an > > > object that never was fully created as a contending object. > > > > I wanted to eliminate that as well by not validating the BO during > > initialization any more. > > > > > So bo creation would then be: > > > > ttm_bo_init(bo) > > > > drm_exec_while_not_all_locked() { > > drm_exec_prepare_object(bo, 1); > > > > ttm_bo_validate(bo); > > } > > > > if (r) > > ttm_bo_put(bo); > > > > return r; > > > > I have that on a branch here somewhere prepared, but never got the > > time > > to clean it up. > > Still, bo creation and validation may be part of a ww transaction as > well, like page-table bos (Although those are pre-locked so perhaps > not > a good example). But in the general case, I'm not sure this is > sufficient for all use-cases. > > /Thomas > > > > > > > Regards, > > Christian. > > > > > > > > So the only really working alternative solution I can see is that > > > drm_exec_trylock simply fails if there is a contended lock and > > > we'd > > > need to live with the weird bo creation situation described > > > above. > > > > > > /Thomas > > > > > > > Regards, > > > > Christian. > > > > > > > > > Cc: Christian König<christian.koenig@xxxxxxx> > > > > > Cc: Somalapuram Amaranath<Amaranath.Somalapuram@xxxxxxx> > > > > > Cc: Matthew Brost<matthew.brost@xxxxxxxxx> > > > > > Cc:<dri-devel@xxxxxxxxxxxxxxxxxxxxx> > > > > > Signed-off-by: Thomas > > > > > Hellström<thomas.hellstrom@xxxxxxxxxxxxxxx> > > > > > --- > > > > > .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 16 ++++--- > > > > > -- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 6 ++-- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c | 4 +-- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 8 ++--- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 8 ++--- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c | 4 +-- > > > > > drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c | 8 ++--- > > > > > drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 +- > > > > > drivers/gpu/drm/drm_exec.c | 35 > > > > > ++++++++++++++----- > > > > > drivers/gpu/drm/drm_gpuvm.c | 8 ++--- > > > > > drivers/gpu/drm/imagination/pvr_job.c | 2 +- > > > > > drivers/gpu/drm/msm/msm_gem_submit.c | 2 +- > > > > > drivers/gpu/drm/nouveau/nouveau_uvmm.c | 2 +- > > > > > drivers/gpu/drm/tests/drm_exec_test.c | 12 +++---- > > > > > drivers/gpu/drm/xe/xe_gt_pagefault.c | 4 +-- > > > > > drivers/gpu/drm/xe/xe_vm.c | 10 +++--- > > > > > include/drm/drm_exec.h | 23 > > > > > +++++++++--- > > > > > 17 files changed, 92 insertions(+), 62 deletions(-) > > > > > > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > > > > > index e4d4e55c08ad..4a08a692aa1f 100644 > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c > > > > > @@ -1152,12 +1152,12 @@ static int reserve_bo_and_vm(struct > > > > > kgd_mem > > > > > *mem, > > > > > drm_exec_init(&ctx->exec, > > > > > DRM_EXEC_INTERRUPTIBLE_WAIT, > > > > > 0); > > > > > drm_exec_until_all_locked(&ctx->exec) { > > > > > ret = amdgpu_vm_lock_pd(vm, &ctx->exec, 2); > > > > > - drm_exec_retry_on_contention(&ctx->exec); > > > > > + ret = drm_exec_retry_on_contention(&ctx- > > > > > >exec, > > > > > ret); > > > > > if (unlikely(ret)) > > > > > goto error; > > > > > > > > > > ret = drm_exec_prepare_obj(&ctx->exec, &bo- > > > > > > tbo.base, 1); > > > > > - drm_exec_retry_on_contention(&ctx->exec); > > > > > + ret = drm_exec_retry_on_contention(&ctx- > > > > > >exec, > > > > > ret); > > > > > if (unlikely(ret)) > > > > > goto error; > > > > > } > > > > > @@ -1199,14 +1199,14 @@ static int > > > > > reserve_bo_and_cond_vms(struct > > > > > kgd_mem *mem, > > > > > > > > > > ret = amdgpu_vm_lock_pd(entry- > > > > > >bo_va- > > > > > > base.vm, > > > > > &ctx->exec, > > > > > 2); > > > > > - drm_exec_retry_on_contention(&ctx- > > > > > > exec); > > > > > + ret = > > > > > drm_exec_retry_on_contention(&ctx- > > > > > > exec, ret); > > > > > if (unlikely(ret)) > > > > > goto error; > > > > > ++ctx->n_vms; > > > > > } > > > > > > > > > > ret = drm_exec_prepare_obj(&ctx->exec, &bo- > > > > > > tbo.base, 1); > > > > > - drm_exec_retry_on_contention(&ctx->exec); > > > > > + ret = drm_exec_retry_on_contention(&ctx- > > > > > >exec, > > > > > ret); > > > > > if (unlikely(ret)) > > > > > goto error; > > > > > } > > > > > @@ -2619,7 +2619,7 @@ static int > > > > > validate_invalid_user_pages(struct > > > > > amdkfd_process_info *process_info) > > > > > list_for_each_entry(peer_vm, &process_info- > > > > > > vm_list_head, > > > > > vm_list_node) { > > > > > ret = amdgpu_vm_lock_pd(peer_vm, > > > > > &exec, > > > > > 2); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + ret = > > > > > drm_exec_retry_on_contention(&exec, > > > > > ret); > > > > > if (unlikely(ret)) > > > > > goto unreserve_out; > > > > > } > > > > > @@ -2631,7 +2631,7 @@ static int > > > > > validate_invalid_user_pages(struct > > > > > amdkfd_process_info *process_info) > > > > > > > > > > gobj = &mem->bo->tbo.base; > > > > > ret = drm_exec_prepare_obj(&exec, > > > > > gobj, > > > > > 1); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + ret = > > > > > drm_exec_retry_on_contention(&exec, > > > > > ret); > > > > > if (unlikely(ret)) > > > > > goto unreserve_out; > > > > > } > > > > > @@ -2875,7 +2875,7 @@ int > > > > > amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct > > > > > dma_fence __rcu * > > > > > list_for_each_entry(peer_vm, &process_info- > > > > > > vm_list_head, > > > > > vm_list_node) { > > > > > ret = amdgpu_vm_lock_pd(peer_vm, > > > > > &exec, > > > > > 2); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + ret = > > > > > drm_exec_retry_on_contention(&exec, > > > > > ret); > > > > > if (unlikely(ret)) { > > > > > pr_err("Locking VM PD > > > > > failed, > > > > > ret: > > > > > %d\n", ret); > > > > > goto ttm_reserve_fail; > > > > > @@ -2891,7 +2891,7 @@ int > > > > > amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct > > > > > dma_fence __rcu * > > > > > > > > > > gobj = &mem->bo->tbo.base; > > > > > ret = drm_exec_prepare_obj(&exec, > > > > > gobj, > > > > > 1); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + ret = > > > > > drm_exec_retry_on_contention(&exec, > > > > > ret); > > > > > if (unlikely(ret)) { > > > > > pr_err("drm_exec_prepare_obj > > > > > failed, ret: %d\n", ret); > > > > > goto ttm_reserve_fail; > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > > > > > index ec888fc6ead8..299e46a6d934 100644 > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c > > > > > @@ -897,7 +897,7 @@ static int amdgpu_cs_parser_bos(struct > > > > > amdgpu_cs_parser *p, > > > > > > > > > > drm_exec_until_all_locked(&p->exec) { > > > > > r = amdgpu_vm_lock_pd(&fpriv->vm, &p->exec, > > > > > 1 > > > > > + p- > > > > > > gang_size); > > > > > - drm_exec_retry_on_contention(&p->exec); > > > > > + r = drm_exec_retry_on_contention(&p->exec, > > > > > r); > > > > > if (unlikely(r)) > > > > > goto out_free_user_pages; > > > > > > > > > > @@ -905,7 +905,7 @@ static int amdgpu_cs_parser_bos(struct > > > > > amdgpu_cs_parser *p, > > > > > /* One fence for TTM and one for > > > > > each > > > > > CS > > > > > job */ > > > > > r = drm_exec_prepare_obj(&p->exec, > > > > > &e- > > > > > > bo- > > > > > > tbo.base, > > > > > 1 + p- > > > > > > gang_size); > > > > > - drm_exec_retry_on_contention(&p- > > > > > > exec); > > > > > + r = drm_exec_retry_on_contention(&p- > > > > > > exec, > > > > > r); > > > > > if (unlikely(r)) > > > > > goto out_free_user_pages; > > > > > > > > > > @@ -915,7 +915,7 @@ static int amdgpu_cs_parser_bos(struct > > > > > amdgpu_cs_parser *p, > > > > > if (p->uf_bo) { > > > > > r = drm_exec_prepare_obj(&p->exec, > > > > > &p- > > > > > > uf_bo->tbo.base, > > > > > 1 + p- > > > > > > gang_size); > > > > > - drm_exec_retry_on_contention(&p- > > > > > > exec); > > > > > + r = drm_exec_retry_on_contention(&p- > > > > > > exec, > > > > > r); > > > > > if (unlikely(r)) > > > > > goto out_free_user_pages; > > > > > } > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c > > > > > index cfdf558b48b6..8b2b86c7a6c5 100644 > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_csa.c > > > > > @@ -74,7 +74,7 @@ int amdgpu_map_static_csa(struct > > > > > amdgpu_device > > > > > *adev, struct amdgpu_vm *vm, > > > > > r = amdgpu_vm_lock_pd(vm, &exec, 0); > > > > > if (likely(!r)) > > > > > r = drm_exec_lock_obj(&exec, &bo- > > > > > > tbo.base); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + r = drm_exec_retry_on_contention(&exec, r); > > > > > if (unlikely(r)) { > > > > > DRM_ERROR("failed to reserve CSA,PD > > > > > BOs: > > > > > err=%d\n", r); > > > > > goto error; > > > > > @@ -114,7 +114,7 @@ int amdgpu_unmap_static_csa(struct > > > > > amdgpu_device *adev, struct amdgpu_vm *vm, > > > > > r = amdgpu_vm_lock_pd(vm, &exec, 0); > > > > > if (likely(!r)) > > > > > r = drm_exec_lock_obj(&exec, &bo- > > > > > > tbo.base); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + r = drm_exec_retry_on_contention(&exec, r); > > > > > if (unlikely(r)) { > > > > > DRM_ERROR("failed to reserve CSA,PD > > > > > BOs: > > > > > err=%d\n", r); > > > > > goto error; > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c > > > > > index 67c234bcf89f..17e16c971e21 100644 > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c > > > > > @@ -239,12 +239,12 @@ static void > > > > > amdgpu_gem_object_close(struct > > > > > drm_gem_object *obj, > > > > > drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 0); > > > > > drm_exec_until_all_locked(&exec) { > > > > > r = drm_exec_prepare_obj(&exec, &bo- > > > > > >tbo.base, > > > > > 1); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + r = drm_exec_retry_on_contention(&exec, r); > > > > > if (unlikely(r)) > > > > > goto out_unlock; > > > > > > > > > > r = amdgpu_vm_lock_pd(vm, &exec, 0); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + r = drm_exec_retry_on_contention(&exec, r); > > > > > if (unlikely(r)) > > > > > goto out_unlock; > > > > > } > > > > > @@ -776,13 +776,13 @@ int amdgpu_gem_va_ioctl(struct > > > > > drm_device > > > > > *dev, void *data, > > > > > drm_exec_until_all_locked(&exec) { > > > > > if (gobj) { > > > > > r = drm_exec_lock_obj(&exec, gobj); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + r = > > > > > drm_exec_retry_on_contention(&exec, > > > > > r); > > > > > if (unlikely(r)) > > > > > goto error; > > > > > } > > > > > > > > > > r = amdgpu_vm_lock_pd(&fpriv->vm, &exec, 2); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + r = drm_exec_retry_on_contention(&exec, r); > > > > > if (unlikely(r)) > > > > > goto error; > > > > > } > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c > > > > > index 5ca5c47ab54e..1b1a5147606e 100644 > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c > > > > > @@ -1221,12 +1221,12 @@ int > > > > > amdgpu_mes_ctx_map_meta_data(struct > > > > > amdgpu_device *adev, > > > > > drm_exec_until_all_locked(&exec) { > > > > > r = drm_exec_lock_obj(&exec, > > > > > &ctx_data- > > > > > > meta_data_obj- > > > > > > tbo.base); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + r = drm_exec_retry_on_contention(&exec, r); > > > > > if (unlikely(r)) > > > > > goto error_fini_exec; > > > > > > > > > > r = amdgpu_vm_lock_pd(vm, &exec, 0); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + r = drm_exec_retry_on_contention(&exec, r); > > > > > if (unlikely(r)) > > > > > goto error_fini_exec; > > > > > } > > > > > @@ -1292,12 +1292,12 @@ int > > > > > amdgpu_mes_ctx_unmap_meta_data(struct > > > > > amdgpu_device *adev, > > > > > drm_exec_until_all_locked(&exec) { > > > > > r = drm_exec_lock_obj(&exec, > > > > > &ctx_data- > > > > > > meta_data_obj- > > > > > > tbo.base); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + r = drm_exec_retry_on_contention(&exec, r); > > > > > if (unlikely(r)) > > > > > goto out_unlock; > > > > > > > > > > r = amdgpu_vm_lock_pd(vm, &exec, 0); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + r = drm_exec_retry_on_contention(&exec, r); > > > > > if (unlikely(r)) > > > > > goto out_unlock; > > > > > } > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c > > > > > index e22cb2b5cd92..72b8213e352c 100644 > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c > > > > > @@ -77,7 +77,7 @@ int amdgpu_seq64_map(struct amdgpu_device > > > > > *adev, > > > > > struct amdgpu_vm *vm, > > > > > r = amdgpu_vm_lock_pd(vm, &exec, 0); > > > > > if (likely(!r)) > > > > > r = drm_exec_lock_obj(&exec, &bo- > > > > > > tbo.base); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + r = drm_exec_retry_on_contention(&exec, r); > > > > > if (unlikely(r)) > > > > > goto error; > > > > > } > > > > > @@ -138,7 +138,7 @@ void amdgpu_seq64_unmap(struct > > > > > amdgpu_device > > > > > *adev, struct amdgpu_fpriv *fpriv) > > > > > r = amdgpu_vm_lock_pd(vm, &exec, 0); > > > > > if (likely(!r)) > > > > > r = drm_exec_lock_obj(&exec, &bo- > > > > > > tbo.base); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + r = drm_exec_retry_on_contention(&exec, r); > > > > > if (unlikely(r)) > > > > > goto error; > > > > > } > > > > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c > > > > > b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c > > > > > index e01c1c8e64c4..63392ce43945 100644 > > > > > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c > > > > > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c > > > > > @@ -89,12 +89,12 @@ static int map_ring_data(struct > > > > > amdgpu_device > > > > > *adev, struct amdgpu_vm *vm, > > > > > drm_exec_init(&exec, 0, 0); > > > > > drm_exec_until_all_locked(&exec) { > > > > > r = drm_exec_lock_obj(&exec, &bo->tbo.base); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + r = drm_exec_retry_on_contention(&exec, r); > > > > > if (unlikely(r)) > > > > > goto error_fini_exec; > > > > > > > > > > r = amdgpu_vm_lock_pd(vm, &exec, 0); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + r = drm_exec_retry_on_contention(&exec, r); > > > > > if (unlikely(r)) > > > > > goto error_fini_exec; > > > > > } > > > > > @@ -152,12 +152,12 @@ static int unmap_ring_data(struct > > > > > amdgpu_device *adev, struct amdgpu_vm *vm, > > > > > drm_exec_init(&exec, 0, 0); > > > > > drm_exec_until_all_locked(&exec) { > > > > > r = drm_exec_lock_obj(&exec, &bo->tbo.base); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + r = drm_exec_retry_on_contention(&exec, r); > > > > > if (unlikely(r)) > > > > > goto out_unlock; > > > > > > > > > > r = amdgpu_vm_lock_pd(vm, &exec, 0); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + r = drm_exec_retry_on_contention(&exec, r); > > > > > if (unlikely(r)) > > > > > goto out_unlock; > > > > > } > > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c > > > > > b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c > > > > > index 386875e6eb96..a3aa7fd22f6a 100644 > > > > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c > > > > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c > > > > > @@ -1499,7 +1499,7 @@ static int svm_range_reserve_bos(struct > > > > > svm_validate_context *ctx, bool intr) > > > > > vm = drm_priv_to_vm(pdd->drm_priv); > > > > > > > > > > r = amdgpu_vm_lock_pd(vm, &ctx- > > > > > >exec, > > > > > 2); > > > > > - drm_exec_retry_on_contention(&ctx- > > > > > > exec); > > > > > + r = > > > > > drm_exec_retry_on_contention(&ctx- > > > > > > exec, r); > > > > > if (unlikely(r)) { > > > > > pr_debug("failed %d to > > > > > reserve > > > > > bo\n", r); > > > > > goto unreserve_out; > > > > > diff --git a/drivers/gpu/drm/drm_exec.c > > > > > b/drivers/gpu/drm/drm_exec.c > > > > > index 2da094bdf8a4..3770a5d30213 100644 > > > > > --- a/drivers/gpu/drm/drm_exec.c > > > > > +++ b/drivers/gpu/drm/drm_exec.c > > > > > @@ -28,12 +28,12 @@ > > > > > * drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT); > > > > > * drm_exec_until_all_locked(&exec) { > > > > > * ret = drm_exec_prepare_obj(&exec, boA, 1); > > > > > - * drm_exec_retry_on_contention(&exec); > > > > > + * ret = drm_exec_retry_on_contention(&exec, > > > > > ret); > > > > > * if (ret) > > > > > * goto error; > > > > > * > > > > > * ret = drm_exec_prepare_obj(&exec, boB, 1); > > > > > - * drm_exec_retry_on_contention(&exec); > > > > > + * ret = drm_exec_retry_on_contention(&exec, > > > > > ret); > > > > > * if (ret) > > > > > * goto error; > > > > > * } > > > > > @@ -48,7 +48,8 @@ > > > > > */ > > > > > > > > > > /* Dummy value used to initially enter the retry loop */ > > > > > -#define DRM_EXEC_DUMMY ((void *)~0) > > > > > +#define DRM_EXEC_DUMMY ERR_PTR(-ESTALE) > > > > > +#define DRM_EXEC_CONTENDED ERR_PTR(-EDEADLK) > > > > > > > > > > /* Unlock all objects and drop references */ > > > > > static void drm_exec_unlock_all(struct drm_exec *exec) > > > > > @@ -131,8 +132,7 @@ bool drm_exec_cleanup(struct drm_exec > > > > > *exec) > > > > > return true; > > > > > } > > > > > > > > > > - drm_exec_unlock_all(exec); > > > > > - exec->num_objects = 0; > > > > > + exec->contended = NULL; > > > > > return true; > > > > > } > > > > > EXPORT_SYMBOL(drm_exec_cleanup); > > > > > @@ -194,6 +194,27 @@ static int > > > > > drm_exec_lock_contended(struct > > > > > drm_exec *exec) > > > > > return ret; > > > > > } > > > > > > > > > > +/** > > > > > + * drm_exec_handle_contended() - Perform cleanup before a ww > > > > > transaction restart > > > > > + * @exec: Pointer to the drm_exec object. > > > > > + * > > > > > + * Unlocks all held resvs and re-locks the contended object. > > > > > + * > > > > > + * Return: 0 on success, negative error code on failure. > > > > > + */ > > > > > +int drm_exec_handle_contended(struct drm_exec *exec) > > > > > +{ > > > > > + int ret; > > > > > + > > > > > + drm_exec_unlock_all(exec); > > > > > + exec->num_objects = 0; > > > > > + ret = drm_exec_lock_contended(exec); > > > > > + exec->contended = DRM_EXEC_CONTENDED; > > > > > + > > > > > + return ret; > > > > > +} > > > > > +EXPORT_SYMBOL(drm_exec_handle_contended); > > > > > + > > > > > /** > > > > > * drm_exec_lock_obj - lock a GEM object for use > > > > > * @exec: the drm_exec object with the state > > > > > @@ -209,10 +230,6 @@ int drm_exec_lock_obj(struct drm_exec > > > > > *exec, > > > > > struct drm_gem_object *obj) > > > > > { > > > > > int ret; > > > > > > > > > > - ret = drm_exec_lock_contended(exec); > > > > > - if (unlikely(ret)) > > > > > - return ret; > > > > > - > > > > > if (exec->prelocked == obj) { > > > > > drm_gem_object_put(exec->prelocked); > > > > > exec->prelocked = NULL; > > > > > diff --git a/drivers/gpu/drm/drm_gpuvm.c > > > > > b/drivers/gpu/drm/drm_gpuvm.c > > > > > index f9eb56f24bef..0923d6ae18e2 100644 > > > > > --- a/drivers/gpu/drm/drm_gpuvm.c > > > > > +++ b/drivers/gpu/drm/drm_gpuvm.c > > > > > @@ -1254,18 +1254,18 @@ drm_gpuvm_exec_lock(struct > > > > > drm_gpuvm_exec > > > > > *vm_exec) > > > > > > > > > > drm_exec_until_all_locked(exec) { > > > > > ret = drm_gpuvm_prepare_vm(gpuvm, exec, > > > > > num_fences); > > > > > - drm_exec_retry_on_contention(exec); > > > > > + ret = drm_exec_retry_on_contention(exec, > > > > > ret); > > > > > if (ret) > > > > > goto err; > > > > > > > > > > ret = drm_gpuvm_prepare_objects(gpuvm, exec, > > > > > num_fences); > > > > > - drm_exec_retry_on_contention(exec); > > > > > + ret = drm_exec_retry_on_contention(exec, > > > > > ret); > > > > > if (ret) > > > > > goto err; > > > > > > > > > > if (vm_exec->extra.fn) { > > > > > ret = vm_exec->extra.fn(vm_exec); > > > > > - drm_exec_retry_on_contention(exec); > > > > > + ret = > > > > > drm_exec_retry_on_contention(exec, > > > > > ret); > > > > > if (ret) > > > > > goto err; > > > > > } > > > > > @@ -1346,7 +1346,7 @@ drm_gpuvm_exec_lock_range(struct > > > > > drm_gpuvm_exec *vm_exec, > > > > > drm_exec_until_all_locked(exec) { > > > > > ret = drm_gpuvm_prepare_range(gpuvm, exec, > > > > > addr, > > > > > range, > > > > > vm_exec- > > > > > > num_fences); > > > > > - drm_exec_retry_on_contention(exec); > > > > > + ret = drm_exec_retry_on_contention(exec, > > > > > ret); > > > > > if (ret) > > > > > goto err; > > > > > } > > > > > diff --git a/drivers/gpu/drm/imagination/pvr_job.c > > > > > b/drivers/gpu/drm/imagination/pvr_job.c > > > > > index 78c2f3c6dce0..6e0ce6c4576c 100644 > > > > > --- a/drivers/gpu/drm/imagination/pvr_job.c > > > > > +++ b/drivers/gpu/drm/imagination/pvr_job.c > > > > > @@ -574,7 +574,7 @@ prepare_job_resvs_for_each(struct > > > > > drm_exec > > > > > *exec, struct pvr_job_data *job_data, > > > > > drm_exec_until_all_locked(exec) { > > > > > int err = jobs_lock_all_objs(exec, job_data, > > > > > job_count); > > > > > > > > > > - drm_exec_retry_on_contention(exec); > > > > > + err = drm_exec_retry_on_contention(exec, > > > > > err); > > > > > if (err) > > > > > return err; > > > > > } > > > > > diff --git a/drivers/gpu/drm/msm/msm_gem_submit.c > > > > > b/drivers/gpu/drm/msm/msm_gem_submit.c > > > > > index fba78193127d..01992b43ea4b 100644 > > > > > --- a/drivers/gpu/drm/msm/msm_gem_submit.c > > > > > +++ b/drivers/gpu/drm/msm/msm_gem_submit.c > > > > > @@ -259,7 +259,7 @@ static int submit_lock_objects(struct > > > > > msm_gem_submit *submit) > > > > > for (unsigned i = 0; i < submit->nr_bos; > > > > > i++) > > > > > { > > > > > struct drm_gem_object *obj = submit- > > > > > > bos[i].obj; > > > > > ret = drm_exec_prepare_obj(&submit- > > > > > > exec, > > > > > obj, 1); > > > > > - > > > > > drm_exec_retry_on_contention(&submit- > > > > > > exec); > > > > > + ret = > > > > > drm_exec_retry_on_contention(&submit->exec, ret); > > > > > if (ret) > > > > > goto error; > > > > > } > > > > > diff --git a/drivers/gpu/drm/nouveau/nouveau_uvmm.c > > > > > b/drivers/gpu/drm/nouveau/nouveau_uvmm.c > > > > > index ee02cd833c5e..0c871634fdfb 100644 > > > > > --- a/drivers/gpu/drm/nouveau/nouveau_uvmm.c > > > > > +++ b/drivers/gpu/drm/nouveau/nouveau_uvmm.c > > > > > @@ -1350,7 +1350,7 @@ nouveau_uvmm_bind_job_submit(struct > > > > > nouveau_job *job, > > > > > drm_exec_init(exec, vme->flags, 0); > > > > > drm_exec_until_all_locked(exec) { > > > > > ret = bind_lock_validate(job, exec, vme- > > > > > > num_fences); > > > > > - drm_exec_retry_on_contention(exec); > > > > > + ret = drm_exec_retry_on_contention(exec, > > > > > ret); > > > > > if (ret) { > > > > > op = list_last_op(&bind_job->ops); > > > > > goto unwind; > > > > > diff --git a/drivers/gpu/drm/tests/drm_exec_test.c > > > > > b/drivers/gpu/drm/tests/drm_exec_test.c > > > > > index 81f928a429ba..28558fdb08df 100644 > > > > > --- a/drivers/gpu/drm/tests/drm_exec_test.c > > > > > +++ b/drivers/gpu/drm/tests/drm_exec_test.c > > > > > @@ -63,7 +63,7 @@ static void test_lock(struct kunit *test) > > > > > drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, > > > > > 0); > > > > > drm_exec_until_all_locked(&exec) { > > > > > ret = drm_exec_lock_obj(&exec, &gobj); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + ret = drm_exec_retry_on_contention(&exec, > > > > > ret); > > > > > KUNIT_EXPECT_EQ(test, ret, 0); > > > > > if (ret) > > > > > break; > > > > > @@ -83,14 +83,14 @@ static void test_lock_unlock(struct kunit > > > > > *test) > > > > > drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, > > > > > 0); > > > > > drm_exec_until_all_locked(&exec) { > > > > > ret = drm_exec_lock_obj(&exec, &gobj); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + ret = drm_exec_retry_on_contention(&exec, > > > > > ret); > > > > > KUNIT_EXPECT_EQ(test, ret, 0); > > > > > if (ret) > > > > > break; > > > > > > > > > > drm_exec_unlock_obj(&exec, &gobj); > > > > > ret = drm_exec_lock_obj(&exec, &gobj); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + ret = drm_exec_retry_on_contention(&exec, > > > > > ret); > > > > > KUNIT_EXPECT_EQ(test, ret, 0); > > > > > if (ret) > > > > > break; > > > > > @@ -110,13 +110,13 @@ static void test_duplicates(struct > > > > > kunit > > > > > *test) > > > > > drm_exec_init(&exec, DRM_EXEC_IGNORE_DUPLICATES, 0); > > > > > drm_exec_until_all_locked(&exec) { > > > > > ret = drm_exec_lock_obj(&exec, &gobj); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + ret = drm_exec_retry_on_contention(&exec, > > > > > ret); > > > > > KUNIT_EXPECT_EQ(test, ret, 0); > > > > > if (ret) > > > > > break; > > > > > > > > > > ret = drm_exec_lock_obj(&exec, &gobj); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + ret = drm_exec_retry_on_contention(&exec, > > > > > ret); > > > > > KUNIT_EXPECT_EQ(test, ret, 0); > > > > > if (ret) > > > > > break; > > > > > @@ -137,7 +137,7 @@ static void test_prepare(struct kunit > > > > > *test) > > > > > drm_exec_init(&exec, DRM_EXEC_INTERRUPTIBLE_WAIT, > > > > > 0); > > > > > drm_exec_until_all_locked(&exec) { > > > > > ret = drm_exec_prepare_obj(&exec, &gobj, 1); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + ret = drm_exec_retry_on_contention(&exec, > > > > > ret); > > > > > KUNIT_EXPECT_EQ(test, ret, 0); > > > > > if (ret) > > > > > break; > > > > > diff --git a/drivers/gpu/drm/xe/xe_gt_pagefault.c > > > > > b/drivers/gpu/drm/xe/xe_gt_pagefault.c > > > > > index 040dd142c49c..20ec1ab1b52d 100644 > > > > > --- a/drivers/gpu/drm/xe/xe_gt_pagefault.c > > > > > +++ b/drivers/gpu/drm/xe/xe_gt_pagefault.c > > > > > @@ -200,7 +200,7 @@ static int handle_pagefault(struct xe_gt > > > > > *gt, > > > > > struct pagefault *pf) > > > > > drm_exec_init(&exec, 0, 0); > > > > > drm_exec_until_all_locked(&exec) { > > > > > ret = xe_pf_begin(&exec, vma, atomic, tile- > > > > > > id); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + ret = drm_exec_retry_on_contention(&exec, > > > > > ret); > > > > > if (ret) > > > > > goto unlock_dma_resv; > > > > > > > > > > @@ -543,7 +543,7 @@ static int handle_acc(struct xe_gt *gt, > > > > > struct > > > > > acc *acc) > > > > > drm_exec_init(&exec, 0, 0); > > > > > drm_exec_until_all_locked(&exec) { > > > > > ret = xe_pf_begin(&exec, vma, true, tile- > > > > > >id); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + ret = drm_exec_retry_on_contention(&exec, > > > > > ret); > > > > > if (ret) > > > > > break; > > > > > } > > > > > diff --git a/drivers/gpu/drm/xe/xe_vm.c > > > > > b/drivers/gpu/drm/xe/xe_vm.c > > > > > index e2ec148c9c33..335524e803e7 100644 > > > > > --- a/drivers/gpu/drm/xe/xe_vm.c > > > > > +++ b/drivers/gpu/drm/xe/xe_vm.c > > > > > @@ -501,7 +501,7 @@ static void > > > > > preempt_rebind_work_func(struct > > > > > work_struct *w) > > > > > bool done = false; > > > > > > > > > > err = xe_preempt_work_begin(&exec, vm, > > > > > &done); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + err = drm_exec_retry_on_contention(&exec, > > > > > err); > > > > > if (err || done) { > > > > > drm_exec_fini(&exec); > > > > > if (err && > > > > > xe_vm_validate_should_retry(&exec, err, &end)) > > > > > @@ -1052,7 +1052,7 @@ static void > > > > > xe_vma_destroy_unlocked(struct > > > > > xe_vma *vma) > > > > > drm_exec_init(&exec, 0, 0); > > > > > drm_exec_until_all_locked(&exec) { > > > > > err = xe_vm_lock_vma(&exec, vma); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + err = drm_exec_retry_on_contention(&exec, > > > > > err); > > > > > if (XE_WARN_ON(err)) > > > > > break; > > > > > } > > > > > @@ -2148,11 +2148,11 @@ static struct xe_vma *new_vma(struct > > > > > xe_vm > > > > > *vm, struct drm_gpuva_op_map *op, > > > > > err = 0; > > > > > if (!bo->vm) { > > > > > err = > > > > > drm_exec_lock_obj(&exec, > > > > > xe_vm_obj(vm)); > > > > > - > > > > > drm_exec_retry_on_contention > > > > > (& > > > > > exec); > > > > > + err = > > > > > drm_exec_retry_on_contention(&exec, err); > > > > > } > > > > > if (!err) { > > > > > err = > > > > > drm_exec_lock_obj(&exec, > > > > > &bo->ttm.base); > > > > > - > > > > > drm_exec_retry_on_contention > > > > > (& > > > > > exec); > > > > > + err = > > > > > drm_exec_retry_on_contention(&exec, err); > > > > > } > > > > > if (err) { > > > > > drm_exec_fini(&exec); > > > > > @@ -2884,7 +2884,7 @@ static int > > > > > vm_bind_ioctl_ops_execute(struct > > > > > xe_vm *vm, > > > > > DRM_EXEC_IGNORE_DUPLICATES, 0); > > > > > drm_exec_until_all_locked(&exec) { > > > > > err = vm_bind_ioctl_ops_lock_and_prep(&exec, > > > > > vm, > > > > > vops); > > > > > - drm_exec_retry_on_contention(&exec); > > > > > + err = drm_exec_retry_on_contention(&exec, > > > > > err); > > > > > if (err) > > > > > goto unlock; > > > > > > > > > > diff --git a/include/drm/drm_exec.h b/include/drm/drm_exec.h > > > > > index aa786b828a0a..fafb40d96e38 100644 > > > > > --- a/include/drm/drm_exec.h > > > > > +++ b/include/drm/drm_exec.h > > > > > @@ -51,6 +51,8 @@ struct drm_exec { > > > > > struct drm_gem_object *prelocked; > > > > > }; > > > > > > > > > > +int drm_exec_handle_contended(struct drm_exec *exec); > > > > > + > > > > > /** > > > > > * drm_exec_obj() - Return the object for a give drm_exec > > > > > index > > > > > * @exec: Pointer to the drm_exec context > > > > > @@ -113,15 +115,26 @@ __PASTE(__drm_exec_, > > > > > __LINE__): \ > > > > > /** > > > > > * drm_exec_retry_on_contention - restart the loop to grap > > > > > all > > > > > locks > > > > > * @exec: drm_exec object > > > > > + * @_ret: The current error status > > > > > * > > > > > * Control flow helper to continue when a contention was > > > > > detected > > > > > and we need to > > > > > * clean up and re-start the loop to prepare all GEM > > > > > objects. > > > > > + * > > > > > + * Return: If no loop restart occurred: The error status. > > > > > */ > > > > > -#define > > > > > drm_exec_retry_on_contention(exec) \ > > > > > - do > > > > > { \ > > > > > - if > > > > > (unlikely(drm_exec_is_contended(exec))) \ > > > > > - goto > > > > > *__drm_exec_retry_ptr; \ > > > > > - } while (0) > > > > > +#define drm_exec_retry_on_contention(exec, > > > > > _ret) \ > > > > > + ({ > > > > > > > > > > \ > > > > > + struct drm_exec *__exec = > > > > > (exec); \ > > > > > + int __ret = > > > > > (_ret); \ > > > > > + > > > > > > > > > > \ > > > > > + if (unlikely(drm_exec_is_contended(__exec))) > > > > > { \ > > > > > + WARN_ON(__ret != - > > > > > EDEADLK); \ > > > > > + __ret = > > > > > drm_exec_handle_contended(__exec); \ > > > > > + if > > > > > (!__ret) \ > > > > > + goto > > > > > *__drm_exec_retry_ptr; \ > > > > > + } > > > > > > > > > > \ > > > > > + __ret; > > > > > > > > > > \ > > > > > + }) > > > > > > > > > > /** > > > > > * drm_exec_is_contended - check for contention >