On Tue, Apr 05, 2016 at 09:54:58AM +0100, Tvrtko Ursulin wrote: > > On 04/04/16 19:58, Chris Wilson wrote: > >On Mon, Apr 04, 2016 at 05:51:09PM +0100, Tvrtko Ursulin wrote: > >>From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >> > >>Current implementation releases the forcewake at any time between > >>straight away, and one jiffie from the last put, or first automatic > >>grab. > > > >That isn't the problem though. The problem is that we set the timer on > >first use rather than last use. All you are stating here is that by > >lengthening the timeout on your system you reduce the number of times it > >times out. > > It is true the reduction I see is due lengthening of the average timeout. > > But with regards to re-arming approach, I thought so initially > myself, but then, if we expect bursty access then it shouldn't > matter and the simplicity of doing it like it currently is better. > > Even in practice, I noticed the effect of lengthening the timeout is > much greater than moving the arming to the last access. And to get > to very few to none auto releases on busy workloads we need in the > regions of 5ms, which would be a big change. Or maybe not if you > consider HZ=100 kernels. > > It is very difficult to know what is actually better considering > power between the CPU and GPU and performance. So I though the best > would be to keep it similar to the current timings, just fix the > dependency on HZ and also slack with hrtimers might help with > something. > > As a final data point, explicit puts and auto releases seems to be > relatively balanced in my testing. With this patch T-Rex for example > auto-releases in the region of 3-4 times in 10ms, with around 5-10 > explicit gets, and 5-10 implicit gets in 10ms. > > A different, interrupt heavy workload (~20k irqs/sec) manages > similarly 2-4 auto-releases per 10ms, and has ~250 explicit gets and > ~380 implicit per 10ms. > > Looking at the two I think the fact they manage to auto-release > relatively similarly, compared to a huge different in fw gets, > suggest burstyness. So I am not sure that any smarts with the > release period would be interesting. At least not without serious > power/perf measurements. > > >Having said that, the conversion to hrtimer seems sensible but to add > >tracking of the last access as opposed to first we either fallback to > >jiffie (in which case hrtimer is moot) or rely on ktime_get_raw() being > >fast enough for every register write. Hmm, my usual response to that has > >been if it matters we avoid the heavyweight macros and use the _FW > >interface - or even raw readl/writel. > > > >Could you try storing ktime_get_raw() on every access and rearming the > >timer if it expires before last-access + min-period? > > That would be similar to your patch from before my holiday, right? > As I said above, I did not notice much change with that approach. > Just extending the timeout has a much greater effect, but as I wrote > above, I am not sure we want to change it. There was just one bug in that patch in checking the expiration that makes a huge difference, but if you please add the discussion above to the commit log that would be invaluable. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx