Quoting Tvrtko Ursulin (2019-09-18 14:38:06) > > On 17/09/2019 16:39, Chris Wilson wrote: > > As we need to take a walk back along the signaler timeline to find the > > fence before upon which we want to wait, we need to lock that timeline > > to prevent it being modified as we walk. Similarly, we also need to > > acquire a reference to the earlier fence while it still exists! > > > > Though we lack the correct locking today, we are saved by the > > overarching struct_mutex -- but that protection is being removed. > > > > v2: Tvrtko made me realise I was being lax and using annotations to > > ignore the AB-BA deadlock from the timeline overlap. As it would be > > possible to construct a second request that was using a semaphore from the > > same timeline as ourselves, we could quite easily end up in a situation > > where we deadlocked in our mutex waits. Avoid that by using a trylock > > and falling back to a normal dma-fence await if contended. > > I did not figure out the exact AB-BA, but even on a more basic level > without the deadlock, using trylock would mean false positives ie. > falling back to software signaling with random mutex contention on the > same timeline. From a performance perspective this sounds not end of the > world, just unfortunate, but from the design perspective it has me > running away scared. > > I guess the AB-BA would be interdependent requests from two timelines > where the direction of dependency switches over across two pairs of > submissions. Exactly. Oh, you haven't seen the worst of it yet. This is a wonderful mess that just keeps on getting worse as you dig in. > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> > > --- > > drivers/gpu/drm/i915/i915_request.c | 56 +++++++++++++++++++---------- > > 1 file changed, 37 insertions(+), 19 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > > index f12358150097..4e861673fe5c 100644 > > --- a/drivers/gpu/drm/i915/i915_request.c > > +++ b/drivers/gpu/drm/i915/i915_request.c > > @@ -767,16 +767,35 @@ i915_request_create(struct intel_context *ce) > > static int > > i915_request_await_start(struct i915_request *rq, struct i915_request *signal) > > { > > - if (list_is_first(&signal->link, &signal->timeline->requests)) > > + struct intel_timeline *tl = signal->timeline; > > + struct dma_fence *fence; > > + int err; > > + > > + lockdep_assert_held(&rq->timeline->mutex); > > + GEM_BUG_ON(rq->timeline == signal->timeline); > > + > > + if (list_is_first(&signal->link, &tl->requests)) > > return 0; > > > > - signal = list_prev_entry(signal, link); > > - if (intel_timeline_sync_is_later(rq->timeline, &signal->fence)) > > + if (!mutex_trylock(&tl->mutex)) > > + return -EBUSY; > > + > > + fence = NULL; > > + if (!list_is_first(&signal->link, &tl->requests)) > > + fence = dma_fence_get(&list_prev_entry(signal, link)->fence); > > + > > + mutex_unlock(&tl->mutex); > > + if (!fence) > > return 0; > > > > - return i915_sw_fence_await_dma_fence(&rq->submit, > > - &signal->fence, 0, > > - I915_FENCE_GFP); > > + err = 0; > > + if (!intel_timeline_sync_is_later(rq->timeline, fence)) > > + err = i915_sw_fence_await_dma_fence(&rq->submit, > > + fence, 0, > > + I915_FENCE_GFP); > > + dma_fence_put(fence); > > + > > + return err; > > } > > > > static intel_engine_mask_t > > @@ -804,30 +823,24 @@ emit_semaphore_wait(struct i915_request *to, > > { > > u32 hwsp_offset; > > u32 *cs; > > - int err; > > > > GEM_BUG_ON(!from->timeline->has_initial_breadcrumb); > > GEM_BUG_ON(INTEL_GEN(to->i915) < 8); > > > > /* Just emit the first semaphore we see as request space is limited. */ > > if (already_busywaiting(to) & from->engine->mask) > > - return i915_sw_fence_await_dma_fence(&to->submit, > > - &from->fence, 0, > > - I915_FENCE_GFP); > > + goto await_fence; > > > > - err = i915_request_await_start(to, from); > > - if (err < 0) > > - return err; > > + if (i915_request_await_start(to, from) < 0) > > + goto await_fence; > > Does this need to be explicitly only on -EBUSY? Otherwise if > i915_sw_fence_await_dma_fence fails in i915_request_await_start code > jump to do the same i915_sw_fence_await_dma_fence. The only one that concerned me is ignoring any potential EINTR. All the other errors are transient and so trying again with the basic await is a valid response (imo). Not bailing out due to a pending signal though is a trade-off between our latency and their latency. To be honest, I like the simpler code where we just pretend we never noticed the signal unless we block again. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx