Re: [PATCH v2] drm/i915: Keep the forcewake timer alive for 1ms past the most recent use

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 15/05/2017 13:35, Chris Wilson wrote:
On Mon, May 15, 2017 at 01:06:47PM +0100, Tvrtko Ursulin wrote:

On 15/05/2017 11:41, Chris Wilson wrote:
On Mon, May 15, 2017 at 11:14:32AM +0100, Tvrtko Ursulin wrote:

On 12/05/2017 23:16, Chris Wilson wrote:
Currently the timer is armed for 1ms after the first use and is killed
immediately, dropping the forcewake as early as possible. However, for

Correct for implicit grabs, but for explicit it is 1-2ms after the last use.

>From the put of the first, we don't rearm the timer on later puts.

What do you mean by first? I see the timer getting armed on the last put.

Once the timer is armed, the next put will do nothing.

very frequent operations the forcewake dance has a large impact on
latency and keeping the timer alive until we are idle is preferred. To

What workloads see the difference and by how much?

The issue I have is that we can't submit nops fast enough using

Fast enough for what? You mean just for your liking ie. we can be faster?

We used to be faster (single context nop). And we are still not close to
ringbuffer. :|

To my liking, but looking at Vk traces they have lots of little batches
(bordering on the nop) with perhaps one or two big ones. But we do hit
the lite-restore repeated submission path many times per frame. (Still
orders of magnitude less impact than gem_exec_nop! ;)

lite-restore. A large part of that was from the rpm get/put in the
tasklet, but I suspect ultimately it is the extra mmio/lite-restore that
is slowing us down. Now, I'm happy that the lite-restore to keep port[1]
accessible for the next context is a benefit, so I'm looking at how we
can improve the continual resubmission.

Ok.

At the time I've fixed the auto-release to go from 0-1 jiffies to
1-2ms, we talked about this conundrum - whether to consider the
first grab or last put for the timer. But we decided thorough
testing is needed to see if this would make a difference and what
power side effects it might have.

In the above scenario, it never goes off so we are the paying the worst
price of a useless dance. It's the periodic 1ms poll on an idle system
that will suffer most, but in pratice this will delay turning off by an
extra 1ms - and that may be the difference between 0.1% and 50% in power
consumption :|

What is a periodic 1ms poll on the idle system? If the system is
idle the auto-release timer will not be running.

Otherwise idle, except for the hypothetical poll to keep arming the
timer.

In the rapid short submission scenario I see the auto-release timer
potentially racing with the tasklet and needlessly dropping the fw.
But rapid short submission is so much faster than the 1-2ms
auto-release that this race must be very infrequent.

I instead expect mostly to see the timer run and find the
domain->wake_count > 0 due a tasklet running in parallel who has
grabbed the fw for itself.

Worse case scenario sounds like it would be some submission period
around the auto-release period but just shifted in "phase", no?

So I do see some benefit, but would just want to see some numbers in
the commit message and a more precise description of the scenario it
improves.

Would have to measure the 99.9th percentiles as we only affect those
rare racing nops, so the average we report is unaffected - certainly it
is not the 2us, on average, I'm trying to recover.

I've changed my mind over the weekend and started thinking we should have this even if the commit message did not fully explain everything.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux