Re: [PATCH 08/15] drm/i915: Slaughter the thundering i915_wait_request herd

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Mon, 30 Nov 2015 13:33:50 +0000

On 30/11/15 12:38, Chris Wilson wrote:
On Mon, Nov 30, 2015 at 12:09:30PM +0000, Tvrtko Ursulin wrote:

Hi,

On 30/11/15 10:53, Chris Wilson wrote:
On Sun, Nov 29, 2015 at 08:48:06AM +0000, Chris Wilson wrote:
+	/* Optimistic spin for the next jiffie before touching IRQs */
+	if (intel_breadcrumbs_add_waiter(req)) {
+		ret = __i915_spin_request(req, state);
+		if (ret == 0)
+			goto out;

There are a couple of interesting side-effects here.  As we know start
up the irq in parallel and keep it running for longer, irq/i915 now
consumes a lot of CPU time (like 50-100%!) for synchronous batches, but
doesn't seem to interfere with latency (as spin_request is still nicely
running and catching the request completion). That should also still
work nicely on single cpu machines as the irq enabling should not
preempt us.  The second interesting side-effect is that the synchronous
loads that regressed with a 2us spin-request timeout are now happy again
at 2us. Also with the active-request and the can-spin check from
add_waiter, running a low fps game with a compositor is not burning the
CPU with any spinning.

Interesting? :) Sounds bad the way you presented it.

Why and where is the thread burning so much CPU? Would per engine
req tree locks help?

Just simply being woken up after every batch and checking the seqno is
that expensive. Almost as expensive as the IRQ handler itself! (I expect
top is adding the IRQ handler time to the irq/i915 in this case, perf
says that more time is spent in the IRQ than the bottom-half.) Almost
all waiters will be on the same engine, so I don't expect finer grained
spinlocks to be hugely important.

Is this new CPU time or the one which would previously be accounted
against each waiter, polling in wait request?

This is CPU time that used to be absorbed in i915_wait_request(),
currently hidden by i915_spin_request(). It is the reason why having
every waiter doing the check after every interrupt is such a big issue
for some workloads.

So overall total CPU time is similar, just split differently?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx