Hi Joonas, On Fri, Dec 14, 2018 at 3:57 PM Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> wrote: > > Quoting Ankit Navik (2018-12-11 12:14:17) > > drm/i915: Context aware user agnostic EU/Slice/Sub-slice control > > within kernel > > > > Current GPU configuration code for i915 does not allow us to change > > EU/Slice/Sub-slice configuration dynamically. Its done only once while > > context is created. > > > > While particular graphics application is running, if we examine the > > command requests from user space, we observe that command density is not > consistent. > > It means there is scope to change the graphics configuration > > dynamically even while context is running actively. This patch series > > proposes the solution to find the active pending load for all active > > context at given time and based on that, dynamically perform graphics > configuration for each context. > > > > We use a hr (high resolution) timer with i915 driver in kernel to get > > a callback every few milliseconds (this timer value can be configured > > through debugfs, default is '0' indicating timer is in disabled state > > i.e. original system without any intervention).In the timer callback, > > we examine pending commands for a context in the queue, essentially, > > we intercept them before they are executed by GPU and we update context > with required number of EUs. > > > > Two questions, how did we arrive at right timer value? and what's the > > right number of EUs? For the prior one, empirical data to achieve best > > performance in least power was considered. For the later one, we > > roughly categorized number of EUs logically based on platform. Now we > > compare number of pending commands with a particular threshold and > > then set number of EUs accordingly with update context. That threshold > > is also based on experiments & findings. If GPU is able to catch up > > with CPU, typically there are no pending commands, the EU config would > > remain unchanged there. In case there are more pending commands we > > reprogram context with higher number of EUs. Please note, here we are > changing EUs even while context is running by examining pending commands > every 'x' > > milliseconds. > > On the overall strategy. This will be unsuitable to be merged as a debugfs > interface. So is the idea to evolve into a sysfs interface? As this seems to require > tuning for each specific workload, I don't think that would scale too well if you > consider a desktop distro? We started initially as debugfs interface. I have added the comment to move the functionality into sysfs interface. Yes, I will consider the desktop distro and share the detail results. > > Also, there's the patch series to to enable/disable subslices with VME hardware > (the other dynamic slice shutdown/SSEU series) depending on the type of load > being run. Certain workloads would hang the system if they're executed with full > subslice configuration. In that light, it would make more sense if the applications > would be the ones reporting their optimal running configuration. I think, the series expose rpcs for gen 11 only for VME use case. The patch I have tested on KBL (Gen 9). I will consider other gen 9 platform as well. > > > > > With this solution in place, on KBL-GT3 + Android we saw following pnp > > benefits, power numbers mentioned here are system power. > > > > App /KPI | % Power | > > | Benefit | > > | (mW) | > > ---------------------------------| > > 3D Mark (Ice storm) | 2.30% | > > TRex On screen | 2.49% | > > TRex Off screen | 1.32% | > > ManhattanOn screen | 3.11% | > > Manhattan Off screen | 0.89% | > > AnTuTu 6.1.4 | 3.42% | > > SynMark2 | 1.70% | > > Just to verify, these numbers are true while there's no negative effect on the > benchmark scores? Yes, There is no impact on the benchmark scores. Thank you Joonas for your valuable feedback. Regards, Ankit > > Regards, Joonas > > > Note - For KBL (GEN9) we cannot control at sub-slice level, it was > > always a constraint. > > We always controlled number of EUs rather than sub-slices/slices. > > We have also observed GPU core residencies improves by 1.03%. > > > > Praveen Diwakar (4): > > drm/i915: Get active pending request for given context > > drm/i915: Update render power clock state configuration for given > > context > > drm/i915: set optimum eu/slice/sub-slice configuration based on load > > type > > drm/i915: Predictive governor to control eu/slice/subslice > > > > drivers/gpu/drm/i915/i915_debugfs.c | 90 > +++++++++++++++++++++++++++++++- > > drivers/gpu/drm/i915/i915_drv.c | 4 ++ > > drivers/gpu/drm/i915/i915_drv.h | 9 ++++ > > drivers/gpu/drm/i915/i915_gem_context.c | 23 ++++++++ > > drivers/gpu/drm/i915/i915_gem_context.h | 39 ++++++++++++++ > > drivers/gpu/drm/i915/i915_request.c | 2 + > > drivers/gpu/drm/i915/intel_device_info.c | 47 ++++++++++++++++- > > drivers/gpu/drm/i915/intel_lrc.c | 16 +++++- > > 8 files changed, 226 insertions(+), 4 deletions(-) > > > > -- > > 2.7.4 > > > > _______________________________________________ > > Intel-gfx mailing list > > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > > https://lists.freedesktop.org/mailman/listinfo/intel-gfx _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx