[PATCH] drm/i915: Revert RPS UP_EI value for SandyBridge and IvyBridge

ben at bwidawsk.net (Ben Widawsky) · Thu, 20 Dec 2012 18:07:08 -0800



On Thu, 20 Dec 2012 17:57:04 -0800
Ben Widawsky <ben at bwidawsk.net> wrote:

> On Thu, Dec 20, 2012 at 03:34:11PM -0800, Jesse Barnes wrote:
> > On Wed, 24 Oct 2012 12:05:35 -0700
> > Ben Widawsky <ben at bwidawsk.net> wrote:
> > 
> > > On Sun, 21 Oct 2012 15:44:02 +0100
> > > Chris Wilson <chris at chris-wilson.co.uk> wrote:
> > > 
> > > > Even though we do not use the EI mode for determining when to
> > > > change GPU frequencies for RPS, changing this value causes no
> > > > up interrupts to be generated whilst an OpenGL client runs.
> > > > 
> > > > Fixes regression from commit
> > > > 1ee9ae3244c4789f3184c5123f3b2d7e405b3f4c Author: Daniel Vetter
> > > > <daniel.vetter at ffwll.ch> Date:   Wed Aug 15 10:41:45 2012 +0200
> > > > 
> > > >     drm/i915: use hsw rps tuning values everywhere on gen6+
> > > > 
> > > > Reported-by: Eric Anholt <eric at anholt.net>
> > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > > > Cc: Eric Anholt <eric at anholt.net>
> > > > Cc: Ben Widawsky <ben at bwidawsk.net>
> > > > Cc: Daniel Vetter <daniel.vetter at ffwll.ch>
> > > > Cc: stable at vger.kernel.org
> > > > ---
> > > >  drivers/gpu/drm/i915/intel_pm.c |    2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c
> > > > b/drivers/gpu/drm/i915/intel_pm.c index 81e88c2..15b585e 100644
> > > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > > @@ -2493,7 +2493,7 @@ static void gen6_enable_rps(struct
> > > > drm_device *dev) 
> > > >  	I915_WRITE(GEN6_RP_UP_THRESHOLD, 59400);
> > > >  	I915_WRITE(GEN6_RP_DOWN_THRESHOLD, 245000);
> > > > -	I915_WRITE(GEN6_RP_UP_EI, 66000);
> > > > +	I915_WRITE(GEN6_RP_UP_EI, IS_HASWELL(dev) ? 66000 :
> > > > 100000); I915_WRITE(GEN6_RP_DOWN_EI, 350000);
> > > >  
> > > >  	I915_WRITE(GEN6_RP_IDLE_HYSTERSIS, 10);
> > > 
> > > 
> > > 
> > > 
> > > I've not verified this with interrupts, but simply reading back
> > > the current frequency using the sysfs interface.
> > > 
> > > What I've seen running xonotic on Ivybridge is that we do bump the
> > > frequency initially, and then stops. Nearing the end of the demo,
> > > we again raise the frequency. Note that on my system, both
> > > before, and after this patch, I am able to get to the max GPU
> > > frequency with the xonotic demo.
> > > 
> > > Specifically, on my IVB which has a range of 350->1100 with an
> > > RP1 of 650. I see the following (the demo is roughly 2 minutes)
> > > 
> > > without patch:
> > > Within a few seconds we cycle up to 750
> > > Nothing for about 30 seconds
> > > very slowly cycle up to 1100 (*just* before the demo ends)
> > > demo ends; throttle down to 350 quickly
> > > 
> > > with patch:
> > > Within a few seconds we cycle up to 1000
> > > Nothing for about 30 seconds
> > > cycle up to 1100
> > > demo ends; throttle down to 350 slowly
> > > 
> > > I think if this fixes someones critical issue, it's great, but
> > > unfortunately I do not see the problem the patch claims to fix.
> > > Furthermore, none of us can really make sense of why this has the
> > > effect that it does, but I believe a lot of that is because the
> > > workloads we run (in this case xonotic) are very blackbox.
> > > 
> > > Personally, on this IVB, I think the behavior before the patch is
> > > more desirable because it stays near RP1 for a longer period of
> > > time, and drops to RP0 quickly (but it's definitely a matter of
> > > opinion).
> > 
> > It's a balance between power and performance.  Running at the higher
> > freqs is definitely less power efficient, but without this patch we
> > definitely have a performance regression (running at 750 instead of
> > 1000 MHz for most of the demo, plus the cases Chris saw).
> > 
> > But we also don't want to prevent RC6 entry on vulnerable systems,
> > so maybe we need two sets of values or a broader set of changes
> > that work better for a swath of workloads.
> 
> To me this sounds like an idea which is better on paper than in
> practice. Unless we can generalize the various sets of values with
> knobs for users to pick from, it will really be overwhelming (I
> feel), and will probably provide more bad data than good. Fine
> grained knobs are already given via registers, and really pestering
> users can toy with those, if they want. Perhaps it's on us to add
> better comments to i915_reg.h to explain how we think it works a bit
> better?
> 
> > 
> > Also, the initial patch to use the HSW values implies that the
> > values are applicable across generations.  They're not.  They're
> > very specific to a given generation and potentially even different
> > versions of a given generation, so sharing the values is probably a
> > bad idea in general.
> > 
> > -- 
> > Jesse Barnes, Intel Open Source Technology Center
> 
> To set the record straight, I am pretty sure we can agree on a few
> things.
> 1. We don't really know wtf.
> 2. There will be no globally good set of values.
> 2a. There may even be no globally decent set of values.
> 3. 1 workload is insufficient to determine anything
> 
> If a workload is obviously too GPU render heavy, I think a decent
> solution is to just use the sysfs parameters to force a specific min.
> We don't really need to be skittish about setting this as really
> *most* things are too GPU render heavy for us at the moment. For the
> general case I'll assert the performance 'profile' is the right thing
> unless someone proves otherwise, and here we just want the set of
> values which throttle down faster than they throttle up.
> 

In the email in my head, that said "the power 'profile'"

> I think there is still some value in the proof of concept stuff you
> were hoping to do around a system wide GPU power governor thing.
> Maybe we can just set the RPS autothrottle values to be as dumb as
> possible, and then use that governor to set the min/max frequencies
> as it sees fit. Another similar option is to try to use SW RC6. If we
> had an intern, I'd even suggest to pick some decent ranges for all
> the thresholds and have that intern run benchmarks with all the
> different values.
> 
> Okay, done wasting your time now..