On Mon, Nov 30, 2015 at 01:33:50PM +0000, Tvrtko Ursulin wrote: > > On 30/11/15 12:38, Chris Wilson wrote: > >On Mon, Nov 30, 2015 at 12:09:30PM +0000, Tvrtko Ursulin wrote: > >> > >>Hi, > >> > >>On 30/11/15 10:53, Chris Wilson wrote: > >>>On Sun, Nov 29, 2015 at 08:48:06AM +0000, Chris Wilson wrote: > >>>>+ /* Optimistic spin for the next jiffie before touching IRQs */ > >>>>+ if (intel_breadcrumbs_add_waiter(req)) { > >>>>+ ret = __i915_spin_request(req, state); > >>>>+ if (ret == 0) > >>>>+ goto out; > >>> > >>>There are a couple of interesting side-effects here. As we know start > >>>up the irq in parallel and keep it running for longer, irq/i915 now > >>>consumes a lot of CPU time (like 50-100%!) for synchronous batches, but > >>>doesn't seem to interfere with latency (as spin_request is still nicely > >>>running and catching the request completion). That should also still > >>>work nicely on single cpu machines as the irq enabling should not > >>>preempt us. The second interesting side-effect is that the synchronous > >>>loads that regressed with a 2us spin-request timeout are now happy again > >>>at 2us. Also with the active-request and the can-spin check from > >>>add_waiter, running a low fps game with a compositor is not burning the > >>>CPU with any spinning. > >> > >>Interesting? :) Sounds bad the way you presented it. > >> > >>Why and where is the thread burning so much CPU? Would per engine > >>req tree locks help? > > > >Just simply being woken up after every batch and checking the seqno is > >that expensive. Almost as expensive as the IRQ handler itself! (I expect > >top is adding the IRQ handler time to the irq/i915 in this case, perf > >says that more time is spent in the IRQ than the bottom-half.) Almost > >all waiters will be on the same engine, so I don't expect finer grained > >spinlocks to be hugely important. > > > >>Is this new CPU time or the one which would previously be accounted > >>against each waiter, polling in wait request? > > > >This is CPU time that used to be absorbed in i915_wait_request(), > >currently hidden by i915_spin_request(). It is the reason why having > >every waiter doing the check after every interrupt is such a big issue > >for some workloads. > > So overall total CPU time is similar, just split differently? With the pre-enable + spin, we increase CPU time. On those synchronous tests, the runtime is the same but we are now at 100% CPU in the test and an additional 50-100% in the irq/i915. Afaict, this is only affecting those igt, my desktop usage doesn't spin as much. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx