Re: [PATCH 2/3] drm/i915/execlists: Suppress preempting self

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Wed, 23 Jan 2019 16:40:44 +0000

On 23/01/2019 14:22, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2019-01-23 14:08:56)

On 23/01/2019 12:36, Chris Wilson wrote:
In order to avoid preempting ourselves, we currently refuse to schedule
the tasklet if we reschedule an inflight context. However, this glosses
over a few issues such as what happens after a CS completion event and
we then preempt the newly executing context with itself, or if something
else causes a tasklet_schedule triggering the same evaluation to
preempt the active context with itself.

To avoid the extra complications, after deciding that we have
potentially queued a request with higher priority than the currently
executing request, inspect the head of the queue to see if it is indeed
higher priority from another context.

References: a2bf92e8cc16 ("drm/i915/execlists: Avoid kicking priority on the current context")
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
---
   drivers/gpu/drm/i915/i915_scheduler.c | 20 ++++++--
   drivers/gpu/drm/i915/intel_lrc.c      | 67 ++++++++++++++++++++++++---
   2 files changed, 76 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 340faea6c08a..fb5d953430e5 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -239,6 +239,18 @@ sched_lock_engine(struct i915_sched_node *node, struct intel_engine_cs *locked)
       return engine;
   }
   
+static bool inflight(const struct i915_request *rq,
+                  const struct intel_engine_cs *engine)
+{
+     const struct i915_request *active;
+
+     if (!rq->global_seqno)
+             return false;
+
+     active = port_request(engine->execlists.port);
+     return active->hw_context == rq->hw_context;
+}
+
   static void __i915_schedule(struct i915_request *rq,
                           const struct i915_sched_attr *attr)
   {
@@ -328,6 +340,7 @@ static void __i915_schedule(struct i915_request *rq,
               INIT_LIST_HEAD(&dep->dfs_link);
   
               engine = sched_lock_engine(node, engine);
+             lockdep_assert_held(&engine->timeline.lock);
   
               /* Recheck after acquiring the engine->timeline.lock */
               if (prio <= node->attr.priority || node_signaled(node))
@@ -356,17 +369,16 @@ static void __i915_schedule(struct i915_request *rq,
               if (prio <= engine->execlists.queue_priority)
                       continue;
   
+             engine->execlists.queue_priority = prio;
+
               /*
                * If we are already the currently executing context, don't
                * bother evaluating if we should preempt ourselves.
                */
-             if (node_to_request(node)->global_seqno &&
-                 i915_seqno_passed(port_request(engine->execlists.port)->global_seqno,
-                                   node_to_request(node)->global_seqno))
+             if (inflight(node_to_request(node), engine))
                       continue;
   
               /* Defer (tasklet) submission until after all of our updates. */
-             engine->execlists.queue_priority = prio;
               tasklet_hi_schedule(&engine->execlists.tasklet);
       }
   
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8aa8a4862543..d9d744f6ab2c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -182,12 +182,64 @@ static inline int rq_prio(const struct i915_request *rq)
   }
   
   static inline bool need_preempt(const struct intel_engine_cs *engine,
-                             const struct i915_request *last,
-                             int prio)
+                             const struct i915_request *rq,
+                             int q_prio)
   {
-     return (intel_engine_has_preemption(engine) &&
-             __execlists_need_preempt(prio, rq_prio(last)) &&
-             !i915_request_completed(last));
+     const struct intel_context *ctx = rq->hw_context;
+     const int last_prio = rq_prio(rq);
+     struct rb_node *rb;
+     int idx;
+
+     if (!intel_engine_has_preemption(engine))
+             return false;
+
+     if (i915_request_completed(rq))
+             return false;
+
+     if (!__execlists_need_preempt(q_prio, last_prio))
+             return false;
+
+     /*
+      * The queue_priority is a mere hint that we may need to preempt.
+      * If that hint is stale or we may be trying to preempt ourselves,
+      * ignore the request.
+      */
+
+     list_for_each_entry_continue(rq, &engine->timeline.requests, link) {
+             GEM_BUG_ON(rq->hw_context == ctx);

Why would there be no more requests belonging to the same context on the
engine timeline after the first one? Did you mean "if (rq->hw_context ==
ctx) continue;" ?

We enter the function with rq == execlist->port, i.e. the last request
submitted to ELSP[0]. In this loop, we iterate from the start of ELSP[1]
and all the request here belong to that context. It is illegal for
ELSP[0].lrca == ELSP[1].lrca, i.e. the context must be different.

Yes, you are right yet again. I missed the fact it is guaranteed this is 
called with port0. I wonder if function signature should change to make 
this obvious so someone doesn't get the idea to call it with any old 
request.



+             if (rq_prio(rq) > last_prio)
+                     return true;
+     }
+
+     rb = rb_first_cached(&engine->execlists.queue);
+     if (!rb)
+             return false;
+
+     priolist_for_each_request(rq, to_priolist(rb), idx)
+             return rq->hw_context != ctx && rq_prio(rq) > last_prio;

This isn't equivalent to top of the queue priority
(engine->execlists.queue_priority)? Apart from the different ctx check.
So I guess it is easier than storing new engine->execlists.queue_top_ctx
and wondering about the validity of that pointer.

The problem being that queue_priority may not always match the top of
the execlists->queue. Right, the first attempt I tried was to store the
queue_context matching the queue_priority, but due to the suppression of
inflight preemptions, it doesn't match for long, and is not as accurate
as one would like across CS events.

priolist_for_each_request() isn't too horrible for finding the first
pointer. I noted that we teach it to do: for(idx = __ffs(p->used); ...)
though. If we didn't care about checking hw_context, we can compute the
prio from (p->priority+1)<<SHIFT - ffs(p->used).

And this check is definitely needed to avoid some issue? I'll need to 
have another try of understanding the commit and code paths fully 
tomorrow. I mean, only because it would be good to have something more 
elegant that full blown tree lookup.

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx