Hi, On 21/09/2018 10:13, kedar.j.karanje@xxxxxxxxx wrote:
From: "Kedar J. Karanje" <kedar.j.karanje@xxxxxxxxx> drm/i915: Context aware user agnostic EU/Slice/Sub-slice control within kernel Current GPU configuration code for i915 does not allow us to change EU/Slice/Sub-slice configuration dynamically. Its done only once while context is created. While particular graphics application is running, if we examine the command requests from user space, we observe that command density is not consistent. It means there is scope to change the graphics configuration dynamically even while context is running actively. This patch series proposes the solution to find the active pending load for all active context at given time and based on that, dynamically perform graphics configuration for each context. We use a hr (high resolution) timer with i915 driver in kernel to get a callback every few milliseconds (this timer value can be configured through debugfs, default is '0' indicating timer is in disabled state i.e. original system without any intervention).In the timer callback, we examine pending commands for a context in the queue, essentially, we intercept them before they are executed by GPU and we update context with required number of EUs. Two questions, how did we arrive at right timer value? and what's the right number of EUs? For the prior one, empirical data to achieve best performance in least power was considered. For the later one, we roughly categorized number of EUs logically based on platform. Now we compare number of pending commands with a particular threshold and then set number of EUs accordingly with update context. That threshold is also based on experiments & findings. If GPU is able to catch up with CPU, typically there are no pending commands, the EU config would remain unchanged there. In case there are more pending commands we reprogram context with higher number of EUs. Please note, here we are changing EUs even while context is running by examining pending commands every 'x' milliseconds. With this solution in place, on KBL-GT3 + Android we saw following pnp benefits without any performance degradation, power numbers mentioned here are system power. App /KPI | % Power | | Benefit | | (mW) | ---------------------------------| 3D Mark (Ice storm) | 2.30% | TRex On screen | 2.49% | TRex Off screen | 1.32% | ManhattanOn screen | 3.11% | Manhattan Off screen | 0.89% | AnTuTu 6.1.4 | 3.42% |
Good numbers! It is hard to argue against them. :) Even though obviously the heuristics like the one you implemented can be easily fooled by different workloads... Since you make no distinction between different engines and EUs vs fixed functions, external fences vs GPU over-subscription and similar.
But then again you have a control knob and it is off by default. So if it genuinely always only helps typical use cases, and people can stomach a control knob, perhaps we can have it.
In any case I think you would need to test against a lot more benchmarks to pass the threshold of whether upstream could consider this. And definitely not only under Android. It is just that these days I don't know with whom to put you in touch in order to recommend a list of benchmarks, or even provide some automated system to run them. Anybody?
Regards, Tvrtko
Note - For KBL (GEN9) we cannot control at sub-slice level, it was always a constraint. We always controlled number of EUs rather than sub-slices/slices. Praveen Diwakar (4): drm/i915: Get active pending request for given context drm/i915: Update render power clock state configuration for given context drm/i915: set optimum eu/slice/sub-slice configuration based on load type drm/i915: Predictive governor to control eu/slice/subslice based on workload drivers/gpu/drm/i915/i915_debugfs.c | 94 +++++++++++++++++++++++++++++- drivers/gpu/drm/i915/i915_drv.c | 1 + drivers/gpu/drm/i915/i915_drv.h | 6 ++ drivers/gpu/drm/i915/i915_gem_context.c | 52 +++++++++++++++++ drivers/gpu/drm/i915/i915_gem_context.h | 52 +++++++++++++++++ drivers/gpu/drm/i915/i915_gem_execbuffer.c | 5 ++ drivers/gpu/drm/i915/intel_lrc.c | 47 +++++++++++++++ 7 files changed, 256 insertions(+), 1 deletion(-) -- 2.7.4 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx