Re: [PATCH 09/11] drm/i915/execlists: Refactor out can_merge_rq()

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Thu, 31 Jan 2019 09:19:18 +0000

On 30/01/2019 18:14, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2019-01-30 18:05:42)

On 30/01/2019 02:19, Chris Wilson wrote:
In the next patch, we add another user that wants to check whether
requests can be merge into a single HW execution, and in the future we
want to add more conditions under which requests from the same context
cannot be merge. In preparation, extract out can_merge_rq().

Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
---
   drivers/gpu/drm/i915/intel_lrc.c | 30 +++++++++++++++++++-----------
   1 file changed, 19 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 2616b0b3e8d5..e97ce54138d3 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -285,12 +285,11 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
   }
   
   __maybe_unused static inline bool
-assert_priority_queue(const struct intel_engine_execlists *execlists,
-                   const struct i915_request *prev,
+assert_priority_queue(const struct i915_request *prev,
                     const struct i915_request *next)
   {
-     if (!prev)
-             return true;
+     const struct intel_engine_execlists *execlists =
+             &prev->engine->execlists;
   
       /*
        * Without preemption, the prev may refer to the still active element
@@ -601,6 +600,17 @@ static bool can_merge_ctx(const struct intel_context *prev,
       return true;
   }
   
+static bool can_merge_rq(const struct i915_request *prev,
+                      const struct i915_request *next)
+{
+     GEM_BUG_ON(!assert_priority_queue(prev, next));
+
+     if (!can_merge_ctx(prev->hw_context, next->hw_context))
+             return false;
+
+     return true;

I'll assume you'll be adding here in the future as the reason this is
not simply "return can_merge_ctx(...)"?

Yes, raison d'etre of making the change.

   static void port_assign(struct execlist_port *port, struct i915_request *rq)
   {
       GEM_BUG_ON(rq == port_request(port));
@@ -753,8 +763,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
               int i;
   
               priolist_for_each_request_consume(rq, rn, p, i) {
-                     GEM_BUG_ON(!assert_priority_queue(execlists, last, rq));
-
                       /*
                        * Can we combine this request with the current port?
                        * It has to be the same context/ringbuffer and not
@@ -766,8 +774,10 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
                        * second request, and so we never need to tell the
                        * hardware about the first.
                        */
-                     if (last &&
-                         !can_merge_ctx(rq->hw_context, last->hw_context)) {
+                     if (last && !can_merge_rq(last, rq)) {
+                             if (last->hw_context == rq->hw_context)
+                                     goto done;

I don't get this added check. AFAICS it will only trigger with GVT
making it not consider filling both ports if possible.

Because we are preparing for can_merge_rq() deciding not to merge the
same context. If we do that we can't continue on to the next port and
must terminate the loop, violating the trick with the hint in the
process.

This changes due to the next patch, per-context freq and probably more
that I've forgotten.

After a second look, I noticed the existing GVT comment a bit lower down 
which avoids populating port1 already.

Maybe one thing which would make sense is to re-arange these checks in 
the order of "priority", like:

	if (last && !can_merge_rq(...)) {
		// naturally highest prio since it is impossible
		if (port == last_port)
			goto done;
		// 2nd highest to account for programming limitation
		else if (last->hw_context == rq->hw_context)
			goto done;
		// GVT check simplified (I think - since we know last is either 
different ctx or single submit)
		else if (ctx_single_port_submission(rq->hw_context))
			goto done;

+
                               /*
                                * If we are on the second port and cannot
                                * combine this request with the last, then we
@@ -787,7 +797,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
                                   ctx_single_port_submission(rq->hw_context))
                                       goto done;
   
-                             GEM_BUG_ON(last->hw_context == rq->hw_context);

This is related to the previous comment. Rebase error?

Previous if check, so it's clear at this point that we can't be using
the same.

Yep.


@@ -827,8 +836,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
        * request triggering preemption on the next dequeue (or subsequent
        * interrupt for secondary ports).
        */
-     execlists->queue_priority_hint =
-             port != execlists->port ? rq_prio(last) : INT_MIN;
+     execlists->queue_priority_hint = queue_prio(execlists);

This shouldn't be in this patch.

If we terminate the loop early, we need to look at the head of the
queue.

Why it is different for ending early for any other (existing) reason? 
Although I concede better management of queue_priority_hint is exactly 
what I was suggesting. Oops. Consequences are not entirely straight 
forward though.. if we decide not to submit all of a single context, or 
leave port1 empty, currently we would hint scheduling the tasklet for 
any new submission. With this change only after a CS or if a higher ctx 
is submitted. Which is what makes me feel it should be a separate patch 
for a behaviour change (since a high prio, higher than INT_MIN, is 
potentially head of the queue).

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx