Re: [PATCH v3 0/4] Dynamic EU configuration of Slice/Subslice/EU.

"Navik, Ankit P" <ankit.p.navik@xxxxxxxxx> · Fri, 14 Dec 2018 12:09:54 +0000

Hi Joonas, 

On Fri, Dec 14, 2018 at 3:57 PM Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> wrote:
> 
> Quoting Ankit Navik (2018-12-11 12:14:17)
> > drm/i915: Context aware user agnostic EU/Slice/Sub-slice control
> > within kernel
> >
> > Current GPU configuration code for i915 does not allow us to change
> > EU/Slice/Sub-slice configuration dynamically. Its done only once while
> > context is created.
> >
> > While particular graphics application is running, if we examine the
> > command requests from user space, we observe that command density is not
> consistent.
> > It means there is scope to change the graphics configuration
> > dynamically even while context is running actively. This patch series
> > proposes the solution to find the active pending load for all active
> > context at given time and based on that, dynamically perform graphics
> configuration for each context.
> >
> > We use a hr (high resolution) timer with i915 driver in kernel to get
> > a callback every few milliseconds (this timer value can be configured
> > through debugfs, default is '0' indicating timer is in disabled state
> > i.e. original system without any intervention).In the timer callback,
> > we examine pending commands for a context in the queue, essentially,
> > we intercept them before they are executed by GPU and we update context
> with required number of EUs.
> >
> > Two questions, how did we arrive at right timer value? and what's the
> > right number of EUs? For the prior one, empirical data to achieve best
> > performance in least power was considered. For the later one, we
> > roughly categorized number of EUs logically based on platform. Now we
> > compare number of pending commands with a particular threshold and
> > then set number of EUs accordingly with update context. That threshold
> > is also based on experiments & findings. If GPU is able to catch up
> > with CPU, typically there are no pending commands, the EU config would
> > remain unchanged there. In case there are more pending commands we
> > reprogram context with higher number of EUs. Please note, here we are
> changing EUs even while context is running by examining pending commands
> every 'x'
> > milliseconds.
> 
> On the overall strategy. This will be unsuitable to be merged as a debugfs
> interface. So is the idea to evolve into a sysfs interface? As this seems to require
> tuning for each specific workload, I don't think that would scale too well if you
> consider a desktop distro?

We started initially as debugfs interface. I have added the comment to
move the functionality into sysfs interface. Yes, I will consider the desktop
distro and share the detail results.
> 
> Also, there's the patch series to to enable/disable subslices with VME hardware
> (the other dynamic slice shutdown/SSEU series) depending on the type of load
> being run. Certain workloads would hang the system if they're executed with full
> subslice configuration. In that light, it would make more sense if the applications
> would be the ones reporting their optimal running configuration.

I think, the series expose rpcs for gen 11 only for VME use case.
The patch I have tested on KBL (Gen 9). I will consider other gen 9 platform as well. 

> 
> >
> > With this solution in place, on KBL-GT3 + Android we saw following pnp
> > benefits, power numbers mentioned here are system power.
> >
> > App /KPI               | % Power |
> >                        | Benefit |
> >                        |  (mW)   |
> > ---------------------------------|
> > 3D Mark (Ice storm)    | 2.30%   |
> > TRex On screen         | 2.49%   |
> > TRex Off screen        | 1.32%   |
> > ManhattanOn screen     | 3.11%   |
> > Manhattan Off screen   | 0.89%   |
> > AnTuTu  6.1.4          | 3.42%   |
> > SynMark2               | 1.70%   |
> 
> Just to verify, these numbers are true while there's no negative effect on the
> benchmark scores?

Yes, There is no impact on the benchmark scores.
Thank you Joonas for your valuable feedback.

Regards, Ankit

> 
> Regards, Joonas
> 
> > Note - For KBL (GEN9) we cannot control at sub-slice level, it was
> > always  a constraint.
> > We always controlled number of EUs rather than sub-slices/slices.
> > We have also observed GPU core residencies improves by 1.03%.
> >
> > Praveen Diwakar (4):
> >   drm/i915: Get active pending request for given context
> >   drm/i915: Update render power clock state configuration for given
> >     context
> >   drm/i915: set optimum eu/slice/sub-slice configuration based on load
> >     type
> >   drm/i915: Predictive governor to control eu/slice/subslice
> >
> >  drivers/gpu/drm/i915/i915_debugfs.c      | 90
> +++++++++++++++++++++++++++++++-
> >  drivers/gpu/drm/i915/i915_drv.c          |  4 ++
> >  drivers/gpu/drm/i915/i915_drv.h          |  9 ++++
> >  drivers/gpu/drm/i915/i915_gem_context.c  | 23 ++++++++
> > drivers/gpu/drm/i915/i915_gem_context.h  | 39 ++++++++++++++
> >  drivers/gpu/drm/i915/i915_request.c      |  2 +
> >  drivers/gpu/drm/i915/intel_device_info.c | 47 ++++++++++++++++-
> >  drivers/gpu/drm/i915/intel_lrc.c         | 16 +++++-
> >  8 files changed, 226 insertions(+), 4 deletions(-)
> >
> > --
> > 2.7.4
> >
> > _______________________________________________
> > Intel-gfx mailing list
> > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx