Re: [PATCH] drm/i915: Make sample_c messages go faster on Haswell.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thursday, October 30, 2014 09:26:01 PM Ville Syrjälä wrote:
> On Thu, Oct 30, 2014 at 10:32:38AM -0700, Kenneth Graunke wrote:
> > On Thursday, October 30, 2014 01:01:30 PM Ville Syrjälä wrote:
> > > On Thu, Oct 30, 2014 at 02:32:40AM -0700, Kenneth Graunke wrote:
> > > > On Thursday, October 30, 2014 11:00:51 AM Ville Syrjälä wrote:
> > > > > On Thu, Oct 30, 2014 at 10:50:03AM +0200, Ville Syrjälä wrote:
> > > > > > On Wed, Oct 29, 2014 at 03:12:43PM -0700, Kenneth Graunke wrote:
> > > > > > > Haswell significantly improved the performance of sampler_c 
> > messages,
> > > > > > > but the optimization appears to be off by default.  Later 
platforms
> > > > > > > remove this bit, and apparently always enable the optimization.
> > > > > > > 
> > > > > > > Improves performance in "Counter Strike: Global Offensive" by 
18%
> > > > > > > at default settings on Iris Pro.  No Piglit regressions.
> > > > > > 
> > > > > > Nice. We need more bits like this ;)
> > > > > > 
> > > > > > > 
> > > > > > > Signed-off-by: Kenneth Graunke <kenneth@xxxxxxxxxxxxx>
> > > > > > > ---
> > > > > > >  drivers/gpu/drm/i915/i915_reg.h | 1 +
> > > > > > >  drivers/gpu/drm/i915/intel_pm.c | 4 ++++
> > > > > > >  2 files changed, 5 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/drivers/gpu/drm/i915/i915_reg.h 
> > > > b/drivers/gpu/drm/i915/i915_reg.h
> > > > > > > index 77fce96..340821a 100644
> > > > > > > --- a/drivers/gpu/drm/i915/i915_reg.h
> > > > > > > +++ b/drivers/gpu/drm/i915/i915_reg.h
> > > > > > > @@ -5952,6 +5952,7 @@ enum punit_power_well {
> > > > > > >  #define  HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE    (1 << 6)
> > > > > > >  
> > > > > > >  #define HALF_SLICE_CHICKEN3		0xe184
> > > > > > > +#define   HSW_SAMPLE_C_PERFORMANCE	(1<<9)
> > > > > > >  #define   GEN8_CENTROID_PIXEL_OPT_DIS	(1<<8)
> > > > > > >  #define   GEN8_SAMPLER_POWER_BYPASS_DIS	(1<<1)
> > > > > > >  
> > > > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c 
> > > > b/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > index 7a69eba..50c72a7 100644
> > > > > > > --- a/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > > > > > > @@ -5736,6 +5736,10 @@ static void 
haswell_init_clock_gating(struct 
> > > > drm_device *dev)
> > > > > > >  	I915_WRITE(GEN7_GT_MODE,
> > > > > > >  		   GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4);
> > > > > > >  
> > > > > > > +	/* Make sample_c messages faster. */
> > > > > > 
> > > > > > I found a name for it in the w/a database.
> > > > > > 
> > > > > > WaSampleCChickenBitEnable:hsw
> > > > > > 
> > > > > > Reviewed-by: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx>
> > > > > 
> > > > > Oh actually it says palette won't work when this bit is on. I'm 
assuming
> > > > > that's the texture palette. Do we have any use of that anywhere?
> > > > 
> > > > That's a good point.  3DSTATE_SAMPLER_PALETTE_LOAD and the 
A8P8/indexed 
> > > > formats aren't used by Mesa or xf86-video-intel, but it looks like 
they 
> > might 
> > > > be used by libva.
> > > > 
> > > > Can someone confirm that libva does use the sampler palette?
> > > > 
> > > > If they do, what do we do about it?
> > > 
> > > I suppose the best option then would be to use an LRI from a batch,
> > > which means the register would need to be added to the cmd parser
> > > white list. This is one of the context saved registers so doing the
> > > LRI just once per context should be enough.
> > 
> > I don't like that solution.  For one, it's impossible - you can't LRI from 
> > userspace batches, even if you add it to the kernel command parser's 
> > whitelist, because the hardware scanner is still enabled.  Given that I've 
> > been waiting two years for this capability, I want to find a more 
immediate 
> > solution.
> 
> Ah. I've somehow convinced myself the cmd parser might actually be doing
> something besides just eating CPU cycles these days. But I guess not.
> 
> > 
> > Another option is to have some sort of execbuf flag...maybe a 3D/Media 
"usage" 
> > flag.  If set to 3D, write 0x6000200...if media, write 0x6000000.  Or 
> > something specific.  I do hate adding more junk to the execbuf path, 
though.
> > 
> > Other ideas?
> 
> Fast vs. slow flag? :)
> 
> More seriously, one somewhat crappy option would be to initialize that
> bit to 1 for all explicit contexts, and then have the kernel always turn
> it off before executing something with the default context. It's not
> unlike how we imagined the RS stuff would work since old userspace
> doesn't know to turn RS off when using the default context.

Interesting idea - that might work.  We don't need mid-batch changes either.

I don't think HALF_SLICE_CHICKEN3 is part of the logical context, FWIW.

Before we get too much further...we should check if libva is actually broken.  
I don't know if this means the sampler palette completely doesn't work, or if 
it just means sample_c doesn't work with the palette.  If it's the latter, 
we're probably fine, because I doubt libva uses sample_c.

--Ken

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux