Re: [PATCH 3/3] drm/i915: Priority boost switching to an idle ring

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Mon, 23 Jan 2017 12:02:30 +0000

On Mon, Jan 23, 2017 at 11:50:47AM +0000, Tvrtko Ursulin wrote:
> 
> On 23/01/2017 11:16, Chris Wilson wrote:
> >On Mon, Jan 23, 2017 at 10:51:28AM +0000, Tvrtko Ursulin wrote:
> >>
> >>On 21/01/2017 09:25, Chris Wilson wrote:
> >>>In order to maximise concurrency between engines, if we queue a request
> >>>to a current idle ring, reorder its dependencies to execute that request
> >>>as early as possible and ideally improve occupancy of multiple engines
> >>>simultaneously.
> >>>
> >>>Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> >>>---
> >>>drivers/gpu/drm/i915/i915_gem_request.h | 5 +++--
> >>>drivers/gpu/drm/i915/intel_lrc.c        | 3 +++
> >>>2 files changed, 6 insertions(+), 2 deletions(-)
> >>>
> >>>diff --git a/drivers/gpu/drm/i915/i915_gem_request.h b/drivers/gpu/drm/i915/i915_gem_request.h
> >>>index ba83c507613b..7ba9cc53abe9 100644
> >>>--- a/drivers/gpu/drm/i915/i915_gem_request.h
> >>>+++ b/drivers/gpu/drm/i915/i915_gem_request.h
> >>>@@ -74,8 +74,9 @@ struct i915_priotree {
> >>>};
> >>>
> >>>enum {
> >>>-	I915_PRIORITY_LOCKED = I915_PRIORITY_MAX,
> >>>-	I915_PRIORITY_DISPLAY
> >>>+	I915_PRIORITY_STALL = I915_PRIORITY_MAX,
> >>>+	I915_PRIORITY_LOCKED,
> >>>+	I915_PRIORITY_DISPLAY,
> >>>};
> >>>
> >>>struct i915_gem_capture_list {
> >>>diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> >>>index 50bec759989f..b46cb1bb32b8 100644
> >>>--- a/drivers/gpu/drm/i915/intel_lrc.c
> >>>+++ b/drivers/gpu/drm/i915/intel_lrc.c
> >>>@@ -691,6 +691,9 @@ static void execlists_schedule(struct drm_i915_gem_request *request, int prio)
> >>>	struct i915_dependency stack;
> >>>	LIST_HEAD(dfs);
> >>>
> >>>+	if (execlists_elsp_ready(request->engine))
> >>>+		prio = max(prio, I915_PRIORITY_STALL);
> >>>+
> >>
> >>It would have to be execlists_elsp_idle for it to match with the
> >>commit message.
> >>
> >>But even then the idea worries me sufficiently that I would refrain
> >>from adding it.
> >>
> >>I don't like the name of I915_PRIORITY_STALL either, since I think
> >>about stalls as ELSP transitioning to idle and no runnable requests.
> >>It could be I915_PRIORITY_SUBMIT, but I just can't find a good story
> >>for when this might be a good idea.
> >
> >That's the case we are trying to address. This engine is stalled (no
> >requests to run), so we are boosting the priority of its deps on the
> >other engines so that stall time is minimised. That's why I choose STALL
> >because it related to the currently stalled engine.
> 
> Here we don't know that there are no other requests to run - perhaps
> they have been just queued up before this one and not in elsp yet
> (tasklet pending).
> 
> Hm, and ports are also not protected by struct_mutex but the
> engine->timeline->lock so this looks to be open for an race.

That race is immaterial, a READ_ONCE), versus the whole schedule() is run
before the request is ready to submit, and in doing so we have no
knowledge that this request is the ideal one on which to bump the priority.

And yes, this can and will bump multiple requests, whose deps then
compete in a fifo (sadly this may be a first bumped, first out).

> >>Perhaps if we combined it with the no other runnable requests on the
> >>engine it might be passable?
> >
> >We don't have that information yet (trivially available at least). But it
> >still runs into the problem that the first request added to the engine's
> >implicit timeline may not be the first request actually ready-to-submit
> >due to external third parties.
> 
> Would it not be the engine->execlist_first?

No. That is requests queued/ready for hw, beyond those already
submitted. One the one hand, that is more relaxed than checking
elsp_ready(), on the other elsp can be ready, even if we have queued requests.

Hmm. Using first is better here in that regard, but you still will not
like it since it is too eager.

> >In hopefully the near future, we could write:
> >
> >     if (!request->engine->timeline->active_seqno)
> >
> >which would only apply the magic boost to the very first rq added to
> >this engine after a period of idleness. Hmm, that's a bit too weak as we
> >try to avoid retiring.
> >
> >Not keen on that. I'd like to push for keeping ready() first, and idle()
> >as a compromise if need be. The idea is that ready() is avoiding the
> >stall on the second submission slot - but that maybe too preemptive. I
> >can see plenty of arguments and counter-arguments.
> >
> >The key problem is that this heuristic is simply too speculative. But
> >ideal information to choose the request with the shortest dep path, or
> >to choose the overall schedule to minimise delays.
> >
> >After all that I still regard this as an interesting toy that may prove
> >useful, or be counterproductive. I also regard that it should only be
> >interesting in the short term.
> 
> If in doubt leave it out. :)

Solution in search of a problem. If we can find someplace it does good,
then we give it a spin, otherwise it remains on the toy pile.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx