Re: Direct execlists submission

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Mon, 14 May 2018 11:25:45 +0100

Quoting Tvrtko Ursulin (2018-05-14 11:11:54)
> 
> On 14/05/2018 10:37, Chris Wilson wrote:
> > Continuing the discussion with the latest refactorings, however I ran
> > some tests to measure the impact on system (!i915) latency,
> > using igt/benchmarks/gem_syslatency -t 120
> > 
> > drm-tip:
> >       latency mean=1.211us max=10us (no load)
> >       latency mean=2.611us max=83us (i915)
> > 
> >          latency mean=1.720us max=833us (no load, bg writeout)
> >          latency mean=3.294us max=607us (i915, bg writeout)
> > 
> > this series:
> >          latency mean=1.280us max=15us (no load)
> >          latency mean=9.688us max=1271us (i915)
> > 
> >          latency mean=1.712us max=1026us (no load, bg writeout)
> >          latency mean=14.347us max=489850us (i915, bg writeout)
> > 
> > That certainly takes the shine off directly using the tasklet for
> > submission from the irq handler. Being selfish, I still think we can't
> > allow the GPU to stall waiting for ksoftirqd, but at the same time we
> > need to solve the latency issues introduced elsewhere.
> 
> You dropped direct submit on idle from this incarnation, why?

It's still there, right? I just haven't made any changes towards making
it more generic.

> Before the above data my concern was that i915_tasklet in its current 
> form does not buy us anything and adds boilerplate code. I was 
> suggesting two alternatives, either no i915_tasklet at all, or 
> different, more functional and self-contained version which you said 
> wouldn't work with some future code.

We need struct tasklet_struct, I don't think we can easily replace that
locally. (And you are cc'ed on the code that truly abuses tasklet, where
we mix preemption and reset from timers.)

> But now with this data it looks like a quite significant regression even 
> if it fixes the rthog test case. So I don't know where this leaves us. :I

With a challenge to solve. That nasty part is that the timings are still
so small that it's in the tail of the perf profile for this, so we'll
need to look at the latency tracers instead. My first instinct is that
it is engine->timeline.lock contention. And that does seem like it is
doing the trick...
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx