Quoting Tvrtko Ursulin (2019-04-17 13:35:29) > > On 17/04/2019 12:57, Chris Wilson wrote: > > Quoting Tvrtko Ursulin (2019-04-17 12:43:49) > >> > >> On 17/04/2019 08:56, Chris Wilson wrote: > >>> Allow the user to direct which physical engines of the virtual engine > >>> they wish to execute one, as sometimes it is necessary to override the > >>> load balancing algorithm. > >>> > >>> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >>> --- > >>> drivers/gpu/drm/i915/gt/intel_lrc.c | 58 +++++++++++ > >>> drivers/gpu/drm/i915/gt/selftest_lrc.c | 131 +++++++++++++++++++++++++ > >>> drivers/gpu/drm/i915/i915_request.c | 1 + > >>> drivers/gpu/drm/i915/i915_request.h | 3 + > >>> 4 files changed, 193 insertions(+) > >>> > >>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > >>> index d6efd6aa67cb..560a18bb4cbb 100644 > >>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > >>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > >>> @@ -552,6 +552,18 @@ execlists_context_schedule_out(struct i915_request *rq, unsigned long status) > >>> intel_engine_context_out(rq->engine); > >>> execlists_context_status_change(rq, status); > >>> trace_i915_request_out(rq); > >>> + > >>> + /* > >>> + * If this is part of a virtual engine, its next request may have > >>> + * been blocked waiting for access to the active context. We have > >>> + * to kick all the siblings again in case we need to switch (e.g. > >>> + * the next request is not runnable on this engine). Hopefully, > >>> + * we will already have submitted the next request before the > >>> + * tasklet runs and do not need to rebuild each virtual tree > >>> + * and kick everyone again. > >>> + */ > >>> + if (rq->engine != rq->hw_context->engine) > >>> + tasklet_schedule(&rq->hw_context->engine->execlists.tasklet); > >> > >> Is this needed only for non-default execution_mask? If so it would be > >> good to limit it to avoid tasklet storm with plain veng. > > > > The issue is not just with this rq but the next one. If that has a > > restricted mask that prevents it running on this engine, we may have > > missed the opportunity to queue it (and so never run it under just the > > right circumstances). > > > > Something like > > to_virtual_engine(rq->hw_context->engine)->request->execution_mask & ~rq->execution_mask > > > > The storm isn't quite so bad, it's only on context-out, and we often do > > succeed in keeping it busy. I was just trying to avoid pulling in ve here. > > What do you mean by the "pulling in ve" bit? Avoiding using > to_virtual_engine like in the line you wrote above? Just laziness hiding behind an excuse of trying to not to smear veng too widely. > >>> + > >>> + rq = READ_ONCE(ve->request); > >>> + if (!rq) > >>> + return 0; > >>> + > >>> + /* The rq is ready for submission; rq->execution_mask is now stable. */ > >>> + mask = rq->execution_mask; > >>> + if (unlikely(!mask)) { > >>> + /* Invalid selection, submit to a random engine in error */ > >>> + i915_request_skip(rq, -ENODEV); > >> > >> When can this happen? It looks like if it can happen we should reject it > >> earlier. Or if it can't then just assert. > > > > Many submit fences can end up with an interesection of 0. This is the > > convenient point to do the rejection, as with any other asynchronous > > error. > > Which ones are many? Why would we have uAPI which allows setting > impossible things where all requests will fail with -ENODEV? But we are rejecting them in the uAPI, right here. This is the earliest point where all the information for a particular execbuf is available and we have the means of reporting that back. > >>> + mask = ve->siblings[0]->mask; > >>> + } > >>> + > >>> + GEM_TRACE("%s: rq=%llx:%lld, mask=%x, prio=%d\n", > >>> + ve->base.name, > >>> + rq->fence.context, rq->fence.seqno, > >>> + mask, ve->base.execlists.queue_priority_hint); > >>> + > >>> + return mask; > >>> +} > >>> + > >>> static void virtual_submission_tasklet(unsigned long data) > >>> { > >>> struct virtual_engine * const ve = (struct virtual_engine *)data; > >>> const int prio = ve->base.execlists.queue_priority_hint; > >>> + intel_engine_mask_t mask; > >>> unsigned int n; > >>> > >>> + rcu_read_lock(); > >>> + mask = virtual_submission_mask(ve); > >>> + rcu_read_unlock(); > >> > >> What is the RCU for? > > > > Accessing ve->request. There's nothing stopping another engine from > > spotting the ve->request still in its tree, submitting it and it being > > retired all during the read here. > > AFAIU there can only be one instance of virtual_submission_tasklet per > VE at a time and the code above is before the request is inserted into > physical engine trees. So I don't get it. But the veng is being utilized by real engines concurrently, they are who take the ve->request and execute it and so may free the ve->request behind the submission tasklet's back. Later on the spinlock comes into play after we have decided there's a request ready. > Hm.. but going back to the veng patch, there is a > GEM_BUG_ON(ve->request) in virtual_submit_request. Why couldn't this be > called multiple times in parallel for different requests? Because we strictly ordered submission into the veng so that it only considers one ready request at a time. Processing more requests decreased global throughput as load-balancing is no longer "late" (the physical engines then amalgamate all the ve requests into one submit). -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx