Quoting Tvrtko Ursulin (2018-05-08 11:02:38) > > On 07/05/2018 14:57, Chris Wilson wrote: > > During request submission, we call the engine->schedule() function so > > that we may reorder the active requests as required for inheriting the > > new request's priority. This may schedule several tasklets to run on the > > local CPU, but we will need to schedule the tasklets again for the new > > request. Delay all the local tasklets until the end, so that we only > > have to process the queue just once. > > > > v2: Beware PREEMPT_RCU, as then local_bh_disable() is then not a > > superset of rcu_read_lock(). > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > > --- > > drivers/gpu/drm/i915/i915_request.c | 5 ++--- > > 1 file changed, 2 insertions(+), 3 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > > index e4cf76ec14a6..f336942229cf 100644 > > --- a/drivers/gpu/drm/i915/i915_request.c > > +++ b/drivers/gpu/drm/i915/i915_request.c > > @@ -1110,12 +1110,11 @@ void __i915_request_add(struct i915_request *request, bool flush_caches) > > * decide whether to preempt the entire chain so that it is ready to > > * run at the earliest possible convenience. > > */ > > - rcu_read_lock(); > > + local_bh_disable(); > > + rcu_read_lock(); /* RCU serialisation for set-wedged protection */ > > if (engine->schedule) > > engine->schedule(request, &request->ctx->sched); > > rcu_read_unlock(); > > - > > - local_bh_disable(); > > i915_sw_fence_commit(&request->submit); > > local_bh_enable(); /* Kick the execlists tasklet if just scheduled */ > > > > > > Before I wasn't sure about this one, since it lengthens the softirq off > section and still doesn't make the re-schedule atomic. But today I am > thinking that in normal use dependency chains should not be that deep > for it to become an issue. In the light of that, go for it: We have big problems when the dependency chain grows, simply because we recursively build a linear list and then walk it again. It's not the worst nightmare in the code, and I don't think anyone has complained about it yet, but we do have a few pathological igt (gem_exec_schedule deep/wide) to show that it can be the ratelimiting step. As far as blocking tasklets across this; only the local tasklet is blocked and for good reason as we reorder the queues. A tasklet on another CPU is also (mostly) blocked by the spinlock (and local irq off). Overall I don't think the problem is exacerbated at all by the local_bh_disable(). -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx