Re: [PATCH 29/32] drm/i915: Apply an execution_mask to the virtual_engine

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Wed, 17 Apr 2019 12:57:29 +0100

Quoting Tvrtko Ursulin (2019-04-17 12:43:49)
> 
> On 17/04/2019 08:56, Chris Wilson wrote:
> > Allow the user to direct which physical engines of the virtual engine
> > they wish to execute one, as sometimes it is necessary to override the
> > load balancing algorithm.
> > 
> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_lrc.c    |  58 +++++++++++
> >   drivers/gpu/drm/i915/gt/selftest_lrc.c | 131 +++++++++++++++++++++++++
> >   drivers/gpu/drm/i915/i915_request.c    |   1 +
> >   drivers/gpu/drm/i915/i915_request.h    |   3 +
> >   4 files changed, 193 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index d6efd6aa67cb..560a18bb4cbb 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -552,6 +552,18 @@ execlists_context_schedule_out(struct i915_request *rq, unsigned long status)
> >       intel_engine_context_out(rq->engine);
> >       execlists_context_status_change(rq, status);
> >       trace_i915_request_out(rq);
> > +
> > +     /*
> > +      * If this is part of a virtual engine, its next request may have
> > +      * been blocked waiting for access to the active context. We have
> > +      * to kick all the siblings again in case we need to switch (e.g.
> > +      * the next request is not runnable on this engine). Hopefully,
> > +      * we will already have submitted the next request before the
> > +      * tasklet runs and do not need to rebuild each virtual tree
> > +      * and kick everyone again.
> > +      */
> > +     if (rq->engine != rq->hw_context->engine)
> > +             tasklet_schedule(&rq->hw_context->engine->execlists.tasklet);
> 
> Is this needed only for non-default execution_mask? If so it would be 
> good to limit it to avoid tasklet storm with plain veng.

The issue is not just with this rq but the next one. If that has a
restricted mask that prevents it running on this engine, we may have
missed the opportunity to queue it (and so never run it under just the
right circumstances).

Something like
	to_virtual_engine(rq->hw_context->engine)->request->execution_mask & ~rq->execution_mask

The storm isn't quite so bad, it's only on context-out, and we often do
succeed in keeping it busy. I was just trying to avoid pulling in ve here.

> >   static u64 execlists_update_context(struct i915_request *rq)
> > @@ -779,6 +791,9 @@ static bool virtual_matches(const struct virtual_engine *ve,
> >   {
> >       const struct intel_engine_cs *active;
> >   
> > +     if (!(rq->execution_mask & engine->mask)) /* We peeked too soon! */
> > +             return false;
> > +
> >       /*
> >        * We track when the HW has completed saving the context image
> >        * (i.e. when we have seen the final CS event switching out of
> > @@ -3139,12 +3154,44 @@ static const struct intel_context_ops virtual_context_ops = {
> >       .destroy = virtual_context_destroy,
> >   };
> >   
> > +static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve)
> > +{
> > +     struct i915_request *rq;
> > +     intel_engine_mask_t mask;
> 
> intel_engine_mask_t throughout is the wrong type for this, even if we 
> disallowed duplicate siblings, and even more so if we don't.

Why? It's a mask of engine->mask (and has to be of the physical id to
allow it to be position invariant in siblings[]).

> Either way it seems like the 64 sibling limit needs to be back. Or maybe 
> only in the bonding case?

?

> 
> > +
> > +     rq = READ_ONCE(ve->request);
> > +     if (!rq)
> > +             return 0;
> > +
> > +     /* The rq is ready for submission; rq->execution_mask is now stable. */
> > +     mask = rq->execution_mask;
> > +     if (unlikely(!mask)) {
> > +             /* Invalid selection, submit to a random engine in error */
> > +             i915_request_skip(rq, -ENODEV);
> 
> When can this happen? It looks like if it can happen we should reject it 
> earlier. Or if it can't then just assert.

Many submit fences can end up with an interesection of 0. This is the
convenient point to do the rejection, as with any other asynchronous
error.

> > +             mask = ve->siblings[0]->mask;
> > +     }
> > +
> > +     GEM_TRACE("%s: rq=%llx:%lld, mask=%x, prio=%d\n",
> > +               ve->base.name,
> > +               rq->fence.context, rq->fence.seqno,
> > +               mask, ve->base.execlists.queue_priority_hint);
> > +
> > +     return mask;
> > +}
> > +
> >   static void virtual_submission_tasklet(unsigned long data)
> >   {
> >       struct virtual_engine * const ve = (struct virtual_engine *)data;
> >       const int prio = ve->base.execlists.queue_priority_hint;
> > +     intel_engine_mask_t mask;
> >       unsigned int n;
> >   
> > +     rcu_read_lock();
> > +     mask = virtual_submission_mask(ve);
> > +     rcu_read_unlock();
> 
> What is the RCU for?

Accessing ve->request. There's nothing stopping another engine from
spotting the ve->request still in its tree, submitting it and it being
retired all during the read here.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx