Re: [PATCH v7] drm/i915: Slaughter the thundering i915_wait_request herd

Dave Gordon <david.s.gordon@xxxxxxxxx> · Tue, 8 Dec 2015 14:53:27 +0000

On 08/12/15 14:33, Chris Wilson wrote:
On Tue, Dec 08, 2015 at 02:03:39PM +0000, Tvrtko Ursulin wrote:
On 08/12/15 10:44, Chris Wilson wrote:
On Mon, Dec 07, 2015 at 03:08:28PM +0000, Tvrtko Ursulin wrote:
Equally, why wouldn't we wake up all waiters for which the requests
have been completed?

Because we no longer track the requests to be completed, having moved to
a chain of waiting processes instead of a chain of requests. I could
insert a waitqueue into the intel_breadcrumb and that would indeed
necessitate locking in the irq handler (and irq locks everywhere :(.

You have a tree of seqnos each with a wait->task and current seqno
on the engine can be queried. So I don't see where is the problem?

The "problem" is that every process will do its own post-irq seqno check
after being woken up, then grab the common spinlock to remove itself
from the tree.

We could avoid that by using RB_NODE_EMPTY shortcircuiting I suppose.

Would be a cheap check here and it would save a cascading growing
latency as one task wakes up the next one.

Well, it can't be here since we may remove_waiter after a signal
(incomplete wait). So this part has to walk the chain of processes. Ugh,
and have to move the waitqueue from one waiter to the next...

Ok on interrupted waiters it makes no sense, but on normal waiter
removal it would just mean comparing engine->get_seqno() vs the
first waiter seqno and waking up all until the uncompleted one is
found?

It's a tradeoff whether it can be written more neatly than:

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index be76086..474636f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1279,6 +1279,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
                         break;
                 }

+               if (RB_EMPTY_NODE(&wait.node))
+                       break;
+
                 set_task_state(wait.task, state);

                 /* Before we do the heavier coherent read of the seqno,
@@ -1312,7 +1315,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
                         break;
         }
  out:
-       intel_engine_remove_breadcrumb(req->ring, &wait);
+       intel_engine_remove_breadcrumb(req->ring, &wait, ret);
         __set_task_state(wait.task, TASK_RUNNING);
         trace_i915_gem_request_wait_end(req);

diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index ae3ee3c..421c214 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -169,10 +169,14 @@ void intel_engine_enable_fake_irq(struct intel_engine_cs *engine)
  }

  void intel_engine_remove_breadcrumb(struct intel_engine_cs *engine,
-                                   struct intel_breadcrumb *wait)
+                                   struct intel_breadcrumb *wait,
+                                   int ret)
  {
         struct intel_breadcrumbs *b = &engine->breadcrumbs;

+       if (RB_EMPTY_NODE(&wait->node))
+               return;
+
         spin_lock(&b->lock);

         if (b->first_waiter == wait->task) {
@@ -187,6 +191,18 @@ void intel_engine_remove_breadcrumb(struct intel_engine_cs *engine,
                  * completion check.
                  */
                 next = rb_next(&wait->node);
+               if (ret == 0) {
+                       u32 seqno = intel_ring_get_seqno(engine);
+                       while (next &&
+                              i915_seqno_passed(seqno,
+                                                to_crumb(next)->seqno)) {
+                               struct rb_node *n = rb_next(next);
+                               wake_up_process(next->task);
+                               rb_erase(next, &b->requests);
+                               RB_CLEAR_NODE(next);
+                               next = n;
+                       }
+               }
                 task = next ? to_crumb(next)->task : NULL;

                 b->first_waiter = task;

My biggest complaint is that we are mixing request-complete and direct
evaluation of i915_seqno_passed now. (I expect that we can tidy the code
up somewhat.)
-Chris

So how is this going to work in the new "struct fence" regime? (See John 
Harrison's RFC of 23rd November).

The fence structure contains the per-request completion indicator, so 
once a fence has been signalled the owner only needs to check that, and 
doesn't need to reevaluate any seqno comparisons after waking up.

Does that help?

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx