Re: [Intel-gfx] [PATCH] drm/i915/gt: Limit frequency drop to RPe on parking

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Tue, 24 Nov 2020 20:16:13 +0000

Quoting Rodrigo Vivi (2020-11-24 19:46:29)
> On Tue, Nov 24, 2020 at 06:35:21PM +0000, Chris Wilson wrote:
> > We treat idling the GT (intel_rps_park) as a downclock event, and reduce
> > the frequency we intend to restart the GT with. Since the two workloads
> > are likely related (e.g. a compositor rendering every 16ms), we want to
> > carry the frequency and load information from across the idling.
> > However, we do also need to update the frequencies so that workloads
> > that run for less than 1ms are autotuned by RPS (otherwise we leave
> > compositors running at max clocks, draining excess power). Conversely,
> > if we try to run too slowly, the next workload has to run longer. Since
> > there is a hysteresis in the power graph, below a certain frequency
> > running a short workload for longer consumes more energy than running it
> > slightly higher for less time. The exact balance point is unknown
> > beforehand, but measurements with 30fps media playback indicate that RPe
> > is a better choice.
> > 
> > Reported-by: Edward Baker <edward.baker@xxxxxxxxx>
> > Fixes: 043cd2d14ede ("drm/i915/gt: Leave rps->cur_freq on unpark")
> > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> > Cc: Edward Baker <edward.baker@xxxxxxxxx>
> > Cc: Andi Shyti <andi.shyti@xxxxxxxxx>
> > Cc: Lyude Paul <lyude@xxxxxxxxxx>
> > Cc: <stable@xxxxxxxxxxxxxxx> # v5.8+
> > ---
> >  drivers/gpu/drm/i915/gt/intel_rps.c | 4 ++++
> >  1 file changed, 4 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_rps.c b/drivers/gpu/drm/i915/gt/intel_rps.c
> > index b13e7845d483..f74d5e09e176 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_rps.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_rps.c
> > @@ -907,6 +907,10 @@ void intel_rps_park(struct intel_rps *rps)
> >               adj = -2;
> >       rps->last_adj = adj;
> >       rps->cur_freq = max_t(int, rps->cur_freq + adj, rps->min_freq);
> > +     if (rps->cur_freq < rps->efficient_freq) {
> > +             rps->cur_freq = rps->efficient_freq;
> > +             rps->last_adj = 0;
> 
> this is indeed the smallest fix we can propagate:
> 
> 
> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@xxxxxxxxx>
> 
> but I wonder now if we couldn't simply kill the last_adj now and always go
> with the rpe on park/unpark

Since we often have very bursty workloads that are less than 1ms, we do
want to keep the frequency across idling, or else we incur more latency
than is desired by the user (although unpark latency is no joke,
although that is mostly the context switches). The compromise for always
running shorter than an RPS interval is to "gradually" reduce the
frequency (so that compositors do not get stuck at max clocks, yet those
very same compositors also do require very quick autotuning so that
animations are smooth from idle.) Compute is another one where they have
both sustained and bursty workloads, and the shorter-than-RPS bursty
workloads are naturally expected to be to low latency.

So I still think keeping cur_freq is most often the best approach.
-Chris