Re: [RFC 7/9] drm/i915: Interrupt driven fences

John Harrison <John.C.Harrison@xxxxxxxxx> · Wed, 28 Oct 2015 13:00:39 +0000

On 27/07/2015 12:33, Tvrtko Ursulin wrote:

Hi,

On 07/17/2015 03:31 PM, John.C.Harrison@xxxxxxxxx wrote:
From: John Harrison <John.C.Harrison@xxxxxxxxx>

The intended usage model for struct fence is that the signalled 
status should be
set on demand rather than polled. That is, there should not be a need 
for a
'signaled' function to be called everytime the status is queried. 
Instead,
'something' should be done to enable a signal callback from the 
hardware which
will update the state directly. In the case of requests, this is the 
seqno
update interrupt. The idea is that this callback will only be enabled 
on demand
when something actually tries to wait on the fence.

This change removes the polling test and replaces it with the 
callback scheme.
Each fence is added to a 'please poke me' list at the start of
i915_add_request(). The interrupt handler then scans through the 
'poke me' list
when a new seqno pops out and signals any matching fence/request. The 
fence is
then removed from the list so the entire request stack does not need 
to be
scanned every time. Note that the fence is added to the list before 
the commands
to generate the seqno interrupt are added to the ring. Thus the 
sequence is
guaranteed to be race free if the interrupt is already enabled.

Note that the interrupt is only enabled on demand (i.e. when 
__wait_request() is
called). Thus there is still a potential race when enabling the 
interrupt as the
request may already have completed. However, this is simply solved by 
calling
the interrupt processing code immediately after enabling the 
interrupt and
thereby checking for already completed requests.

Lastly, the ring clean up code has the possibility to cancel outstanding
requests (e.g. because TDR has reset the ring). These requests will 
never get
signalled and so must be removed from the signal list manually. This 
is done by
setting a 'cancelled' flag and then calling the regular notify/retire 
code path
rather than attempting to duplicate the list manipulatation and clean 
up code in
multiple places. This also avoid any race condition where the 
cancellation
request might occur after/during the completion interrupt actually 
arriving.

v2: Updated to take advantage of the request unreference no longer 
requiring the
mutex lock.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@xxxxxxxxx>
---
  drivers/gpu/drm/i915/i915_drv.h         |   8 ++
  drivers/gpu/drm/i915/i915_gem.c         | 132 
+++++++++++++++++++++++++++++---
  drivers/gpu/drm/i915/i915_irq.c         |   2 +
  drivers/gpu/drm/i915/intel_lrc.c        |   1 +
  drivers/gpu/drm/i915/intel_ringbuffer.c |   1 +
  drivers/gpu/drm/i915/intel_ringbuffer.h |   1 +
  6 files changed, 136 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h 
b/drivers/gpu/drm/i915/i915_drv.h
index 61c3db2..d7f1aa5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2163,7 +2163,11 @@ void i915_gem_track_fb(struct 
drm_i915_gem_object *old,
  struct drm_i915_gem_request {
      /** Underlying object for implementing the signal/wait stuff. */
      struct fence fence;
+    struct list_head signal_list;
+    struct list_head unsignal_list;

In addition to what Daniel said (one list_head looks enough) it is 
customary to call it _link.

      struct list_head delay_free_list;
+    bool cancelled;
+    bool irq_enabled;

      /** On Which ring this request was generated */
      struct drm_i915_private *i915;
@@ -2241,6 +2245,10 @@ int i915_gem_request_alloc(struct 
intel_engine_cs *ring,
                 struct drm_i915_gem_request **req_out);
  void i915_gem_request_cancel(struct drm_i915_gem_request *req);

+void i915_gem_request_submit(struct drm_i915_gem_request *req);
+void i915_gem_request_enable_interrupt(struct drm_i915_gem_request 
*req);
+void i915_gem_request_notify(struct intel_engine_cs *ring);
+
  int i915_create_fence_timeline(struct drm_device *dev,
                     struct intel_context *ctx,
                     struct intel_engine_cs *ring);
diff --git a/drivers/gpu/drm/i915/i915_gem.c 
b/drivers/gpu/drm/i915/i915_gem.c
index 482835a..7c589a9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1222,6 +1222,11 @@ int __i915_wait_request(struct 
drm_i915_gem_request *req,
      if (list_empty(&req->list))
          return 0;

+    /*
+     * Enable interrupt completion of the request.
+     */
+    i915_gem_request_enable_interrupt(req);
+
      if (i915_gem_request_completed(req))
          return 0;

@@ -1382,6 +1387,10 @@ static void i915_gem_request_retire(struct 
drm_i915_gem_request *request)
      list_del_init(&request->list);
      i915_gem_request_remove_from_client(request);

+    /* In case the request is still in the signal pending list */
+    if (!list_empty(&request->signal_list))
+        request->cancelled = true;
+
      i915_gem_request_unreference(request);
  }

@@ -2534,6 +2543,12 @@ void __i915_add_request(struct 
drm_i915_gem_request *request,
       */
      request->postfix = intel_ring_get_tail(ringbuf);

+    /*
+     * Add the fence to the pending list before emitting the 
commands to
+     * generate a seqno notification interrupt.
+     */
+    i915_gem_request_submit(request);
+
      if (i915.enable_execlists)
          ret = ring->emit_request(request);
      else {
@@ -2653,6 +2668,9 @@ static void i915_gem_request_free(struct 
drm_i915_gem_request *req)
          i915_gem_context_unreference(ctx);
      }

+    if (req->irq_enabled)
+        req->ring->irq_put(req->ring);
+

We get here with interrupts still enabled only if userspace is 
abandoning a wait on an unsignaled fence, did I get that right?
It implies the request has been abandoned in some manner, yes. E.g. TDR 
has killed it, user space has given up, ...


kmem_cache_free(req->i915->requests, req);
  }

@@ -2668,24 +2686,105 @@ static const char 
*i915_gem_request_get_timeline_name(struct fence *req_fence)
      return req->ring->name;
  }

-static bool i915_gem_request_enable_signaling(struct fence *req_fence)
+/*
+ * The request has been submitted to the hardware so add the fence 
to the
+ * list of signalable fences.
+ *
+ * NB: This does not enable interrupts yet. That only occurs on 
demand when
+ * the request is actually waited on. However, adding it to the list 
early
+ * ensures that there is no race condition where the interrupt could 
pop
+ * out prematurely and thus be completely lost. The race is merely 
that the
+ * interrupt must be manually checked for after being enabled.
+ */
+void i915_gem_request_submit(struct drm_i915_gem_request *req)
  {
-    /* Interrupt driven fences are not implemented yet.*/
-    WARN(true, "This should not be called!");
-    return true;
+    fence_enable_sw_signaling(&req->fence);
  }

-static bool i915_gem_request_is_completed(struct fence *req_fence)
+/*
+ * The request is being actively waited on, so enable interrupt based
+ * completion signalling.
+ */
+void i915_gem_request_enable_interrupt(struct drm_i915_gem_request 
*req)
+{
+    if (req->irq_enabled)
+        return;
+
+    WARN_ON(!req->ring->irq_get(req->ring));
+    req->irq_enabled = true;

req->irq_enabled manipulations look racy. Here and in request free it 
is protected by struct_mutex, but that is not held in 
i915_gem_request_notify. Initial feeling is you should use 
ring->fence_lock everyplace you query/manipulate req->irq_enabled.

The only asynchronous access is from _notify() which disables IRQs if 
the flag is set and then clears it. That can't race with the enable 
because the enable only sets the flag after setting IRQs on. The worst 
that can happen on a race is that IRQs are enabled and then immediately 
disabled - truly concurrent execution would result in one test or the 
other failing and so only one code path would be taken. The only other 
usage is in _request_free() but that can only run when the last 
reference has been dropped and that means it is no longer on any list 
that _notify() can see.


+
+    /*
+     * Because the interrupt is only enabled on demand, there is a race
+     * where the interrupt can fire before anyone is looking for it. So
+     * do an explicit check for missed interrupts.
+     */
+    i915_gem_request_notify(req->ring);
+}
+
+static bool i915_gem_request_enable_signaling(struct fence *req_fence)
  {
      struct drm_i915_gem_request *req = container_of(req_fence,
                           typeof(*req), fence);
+
+    i915_gem_request_reference(req);
+    WARN_ON(!list_empty(&req->signal_list));

It looks very unsafe to proceed normally after this WARN_ON. It should 
probably return false here to preserve data structure sanity.
This really should be a BUG_ON but Daniel doesn't like those. It should 
be an impossible code path and not something that can be hit by the user 
being dumb. Anyway, this code has all been changed in the latest 
incarnation.



+ list_add_tail(&req->signal_list, &req->ring->fence_signal_list);
+
+    /*
+     * Note that signalling is always enabled for every request before
+     * that request is submitted to the hardware. Therefore there is
+     * no race condition whereby the signal could pop out before the
+     * request has been added to the list. Hence no need to check
+     * for completion, undo the list add and return false.
+     *
+     * NB: Interrupts are only enabled on demand. Thus there is still a
+     * race where the request could complete before the interrupt has
+     * been enabled. Thus care must be taken at that point.
+     */
+
+    return true;
+}
+
+void i915_gem_request_notify(struct intel_engine_cs *ring)
+{
+    struct drm_i915_gem_request *req, *req_next;
+    unsigned long flags;
      u32 seqno;
+    LIST_HEAD(free_list);

-    BUG_ON(req == NULL);
+    if (list_empty(&ring->fence_signal_list))
+        return;
+
+    seqno = ring->get_seqno(ring, false);
+
+    spin_lock_irqsave(&ring->fence_lock, flags);
+    list_for_each_entry_safe(req, req_next, 
&ring->fence_signal_list, signal_list) {
+        if (!req->cancelled) {
+            if (!i915_seqno_passed(seqno, req->seqno))
+                continue;

-    seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
+            fence_signal_locked(&req->fence);
+        }
+
+        list_del_init(&req->signal_list);

I haven't managed to figure out why is this apparently removing 
requests which have not been signalled from the signal_list?

Shouldn't they be moved to free_list only if i915_seqno_passed?
Requests are only removed from the signal list if either a) the seqno 
has passed or b) the request has been cancelled and thus will never 
actually complete. Not sure what other scenario you are seeing.


+        if (req->irq_enabled) {
+            req->ring->irq_put(req->ring);
+            req->irq_enabled = false;
+        }

-    return i915_seqno_passed(seqno, req->seqno);
+        /* Can't unreference here because that might grab fence_lock */
+        list_add_tail(&req->unsignal_list, &free_list);
+    }
+    spin_unlock_irqrestore(&ring->fence_lock, flags);
+
+    /* It should now be safe to actually free the requests */
+    while (!list_empty(&free_list)) {
+        req = list_first_entry(&free_list,
+                       struct drm_i915_gem_request, unsignal_list);
+        list_del(&req->unsignal_list);
+
+        i915_gem_request_unreference(req);
+    }
  }

  static void i915_fence_timeline_value_str(struct fence *fence, char 
*str, int size)
@@ -2711,7 +2810,6 @@ static const struct fence_ops 
i915_gem_request_fops = {
      .get_driver_name    = i915_gem_request_get_driver_name,
      .get_timeline_name    = i915_gem_request_get_timeline_name,
      .enable_signaling    = i915_gem_request_enable_signaling,
-    .signaled        = i915_gem_request_is_completed,
      .wait            = fence_default_wait,
      .release        = i915_gem_request_release,
      .fence_value_str    = i915_fence_value_str,
@@ -2791,6 +2889,7 @@ int i915_gem_request_alloc(struct 
intel_engine_cs *ring,
          goto err;
      }

+    INIT_LIST_HEAD(&req->signal_list);
      fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
ctx->engine[ring->id].fence_timeline.fence_context,
i915_fence_timeline_get_next_seqno(&ctx->engine[ring->id].fence_timeline));
@@ -2913,6 +3012,13 @@ static void i915_gem_reset_ring_cleanup(struct 
drm_i915_private *dev_priv,

          i915_gem_request_retire(request);
      }
+
+    /*
+     * Make sure any requests that were on the signal pending list get
+     * cleaned up.
+     */
+    i915_gem_request_notify(ring);
+    i915_gem_retire_requests_ring(ring);

Would i915_gem_retire_requests_ring be enough given how it calls 
i915_gem_request_notify itself as the first thing below?

Oops, left over from before retire called notify explicitly.

  }

  void i915_gem_restore_fences(struct drm_device *dev)
@@ -2968,6 +3074,13 @@ i915_gem_retire_requests_ring(struct 
intel_engine_cs *ring)
  {
      WARN_ON(i915_verify_lists(ring->dev));

+    /*
+     * If no-one has waited on a request recently then interrupts will
+     * not have been enabled and thus no requests will ever be 
marked as
+     * completed. So do an interrupt check now.
+     */
+    i915_gem_request_notify(ring);
+
      /* Retire requests first as we use it above for the early return.
       * If we retire requests last, we may use a later seqno and so 
clear
       * the requests lists without clearing the active list, leading to
@@ -5345,6 +5458,7 @@ init_ring_lists(struct intel_engine_cs *ring)
  {
      INIT_LIST_HEAD(&ring->active_list);
      INIT_LIST_HEAD(&ring->request_list);
+    INIT_LIST_HEAD(&ring->fence_signal_list);
      INIT_LIST_HEAD(&ring->delayed_free_list);
  }

diff --git a/drivers/gpu/drm/i915/i915_irq.c 
b/drivers/gpu/drm/i915/i915_irq.c
index d87f173..e446509 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -853,6 +853,8 @@ static void notify_ring(struct intel_engine_cs 
*ring)

      trace_i915_gem_request_notify(ring);

+    i915_gem_request_notify(ring);
+

How many requests are typically on signal_list on some typical 
workloads? This could be a significant performance change since on 
every user interrupt it would talk it all potentially only removing 
one request at a time.

Obviously, some of the IGT tests can produce very large request lists 
(e.g. gem_exec_nop) but running 'normal' stuff rarely seems to generate 
a long list. E.g. running GLBench + GLXGears on an Ubuntu desktop I get 
95% of the time there are ten or fewer requests in the list but the loop 
iterates only once (49%) or twice (46%) because the first request gets 
signalled and the second (if present) aborts the loop.

The biggest problem seems to be that the hardware is brain dead with 
respect to generating interrupts. So if two requests complete in quick 
succession and the ISR only gets to see the second seqno, it still gets 
called a second time. Thus we actually get a ridiculous figure of 60% of 
ISR calls being no-ops because the seqno has not actually advanced. The 
code now checks for duplicate seqnos and early exits. Not sure how to 
get rid of the call completely.


These are just review comments on this particular patch without 
thinking yet of the bigger design questions Daniel has raised.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx