Re: [PATCH 1/3] drm/i915: Use consistent forcewake auto-release timeout across kernel configs

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Tue, 5 Apr 2016 09:59:57 +0100



On Tue, Apr 05, 2016 at 09:54:58AM +0100, Tvrtko Ursulin wrote:
> 
> On 04/04/16 19:58, Chris Wilson wrote:
> >On Mon, Apr 04, 2016 at 05:51:09PM +0100, Tvrtko Ursulin wrote:
> >>From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
> >>
> >>Current implementation releases the forcewake at any time between
> >>straight away, and one jiffie from the last put, or first automatic
> >>grab.
> >
> >That isn't the problem though. The problem is that we set the timer on
> >first use rather than last use. All you are stating here is that by
> >lengthening the timeout on your system you reduce the number of times it
> >times out.
> 
> It is true the reduction I see is due lengthening of the average timeout.
> 
> But with regards to re-arming approach, I thought so initially
> myself, but then, if we expect bursty access then it shouldn't
> matter and the simplicity of doing it like it currently is better.
> 
> Even in practice, I noticed the effect of lengthening the timeout is
> much greater than moving the arming to the last access. And to get
> to very few to none auto releases on busy workloads we need in the
> regions of 5ms, which would be a big change. Or maybe not if you
> consider HZ=100 kernels.
> 
> It is very difficult to know what is actually better considering
> power between the CPU and GPU and performance. So I though the best
> would be to keep it similar to the current timings, just fix the
> dependency on HZ and also slack with hrtimers might help with
> something.
> 
> As a final data point, explicit puts and auto releases seems to be
> relatively balanced in my testing. With this patch T-Rex for example
> auto-releases in the region of 3-4 times in 10ms, with around 5-10
> explicit gets, and 5-10 implicit gets in 10ms.
> 
> A different, interrupt heavy workload (~20k irqs/sec) manages
> similarly 2-4 auto-releases per 10ms, and has ~250 explicit gets and
> ~380 implicit per 10ms.
> 
> Looking at the two I think the fact they manage to auto-release
> relatively similarly, compared to a huge different in fw gets,
> suggest burstyness. So I am not sure that any smarts with the
> release period would be interesting. At least not without serious
> power/perf measurements.
> 
> >Having said that, the conversion to hrtimer seems sensible but to add
> >tracking of the last access as opposed to first we either fallback to
> >jiffie (in which case hrtimer is moot) or rely on ktime_get_raw() being
> >fast enough for every register write. Hmm, my usual response to that has
> >been if it matters we avoid the heavyweight macros and use the _FW
> >interface - or even raw readl/writel.
> >
> >Could you try storing ktime_get_raw() on every access and rearming the
> >timer if it expires before last-access + min-period?
> 
> That would be similar to your patch from before my holiday, right?
> As I said above, I did not notice much change with that approach.
> Just extending the timeout has a much greater effect, but as I wrote
> above, I am not sure we want to change it.

There was just one bug in that patch in checking the expiration that
makes a huge difference, but if you please add the discussion above to
the commit log that would be invaluable.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx