Re: [PATCH v3] drm/i915: Lock signaler timeline while navigating

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Wed, 18 Sep 2019 14:58:24 +0100

On 18/09/2019 14:44, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2019-09-18 14:38:06)

On 17/09/2019 16:39, Chris Wilson wrote:
As we need to take a walk back along the signaler timeline to find the
fence before upon which we want to wait, we need to lock that timeline
to prevent it being modified as we walk. Similarly, we also need to
acquire a reference to the earlier fence while it still exists!

Though we lack the correct locking today, we are saved by the
overarching struct_mutex -- but that protection is being removed.

v2: Tvrtko made me realise I was being lax and using annotations to
ignore the AB-BA deadlock from the timeline overlap. As it would be
possible to construct a second request that was using a semaphore from the
same timeline as ourselves, we could quite easily end up in a situation
where we deadlocked in our mutex waits. Avoid that by using a trylock
and falling back to a normal dma-fence await if contended.

I did not figure out the exact AB-BA, but even on a more basic level
without the deadlock, using trylock would mean false positives ie.
falling back to software signaling with random mutex contention on the
same timeline. From a performance perspective this sounds not end of the
world, just unfortunate, but from the design perspective it has me
running away scared.

I guess the AB-BA would be interdependent requests from two timelines
where the direction of dependency switches over across two pairs of
submissions.

Exactly.

Oh, you haven't seen the worst of it yet. This is a wonderful mess that
just keeps on getting worse as you dig in.

Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx>
---
   drivers/gpu/drm/i915/i915_request.c | 56 +++++++++++++++++++----------
   1 file changed, 37 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index f12358150097..4e861673fe5c 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -767,16 +767,35 @@ i915_request_create(struct intel_context *ce)
   static int
   i915_request_await_start(struct i915_request *rq, struct i915_request *signal)
   {
-     if (list_is_first(&signal->link, &signal->timeline->requests))
+     struct intel_timeline *tl = signal->timeline;
+     struct dma_fence *fence;
+     int err;
+
+     lockdep_assert_held(&rq->timeline->mutex);
+     GEM_BUG_ON(rq->timeline == signal->timeline);
+
+     if (list_is_first(&signal->link, &tl->requests))
               return 0;
   
-     signal = list_prev_entry(signal, link);
-     if (intel_timeline_sync_is_later(rq->timeline, &signal->fence))
+     if (!mutex_trylock(&tl->mutex))
+             return -EBUSY;
+
+     fence = NULL;
+     if (!list_is_first(&signal->link, &tl->requests))
+             fence = dma_fence_get(&list_prev_entry(signal, link)->fence);
+
+     mutex_unlock(&tl->mutex);
+     if (!fence)
               return 0;
   
-     return i915_sw_fence_await_dma_fence(&rq->submit,
-                                          &signal->fence, 0,
-                                          I915_FENCE_GFP);
+     err = 0;
+     if (!intel_timeline_sync_is_later(rq->timeline, fence))
+             err = i915_sw_fence_await_dma_fence(&rq->submit,
+                                                 fence, 0,
+                                                 I915_FENCE_GFP);
+     dma_fence_put(fence);
+
+     return err;
   }
   
   static intel_engine_mask_t
@@ -804,30 +823,24 @@ emit_semaphore_wait(struct i915_request *to,
   {
       u32 hwsp_offset;
       u32 *cs;
-     int err;
   
       GEM_BUG_ON(!from->timeline->has_initial_breadcrumb);
       GEM_BUG_ON(INTEL_GEN(to->i915) < 8);
   
       /* Just emit the first semaphore we see as request space is limited. */
       if (already_busywaiting(to) & from->engine->mask)
-             return i915_sw_fence_await_dma_fence(&to->submit,
-                                                  &from->fence, 0,
-                                                  I915_FENCE_GFP);
+             goto await_fence;
   
-     err = i915_request_await_start(to, from);
-     if (err < 0)
-             return err;
+     if (i915_request_await_start(to, from) < 0)
+             goto await_fence;

Does this need to be explicitly only on -EBUSY? Otherwise if
i915_sw_fence_await_dma_fence fails in i915_request_await_start code
jump to do the same i915_sw_fence_await_dma_fence.

The only one that concerned me is ignoring any potential EINTR. All the
other errors are transient and so trying again with the basic await is a
valid response (imo). Not bailing out due to a pending signal though is a
trade-off between our latency and their latency. To be honest, I like
the simpler code where we just pretend we never noticed the signal
unless we block again.

Okay, should be safe anyway.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx