On Thursday, October 30, 2014 09:26:01 PM Ville Syrjälä wrote: > On Thu, Oct 30, 2014 at 10:32:38AM -0700, Kenneth Graunke wrote: > > On Thursday, October 30, 2014 01:01:30 PM Ville Syrjälä wrote: > > > On Thu, Oct 30, 2014 at 02:32:40AM -0700, Kenneth Graunke wrote: > > > > On Thursday, October 30, 2014 11:00:51 AM Ville Syrjälä wrote: > > > > > On Thu, Oct 30, 2014 at 10:50:03AM +0200, Ville Syrjälä wrote: > > > > > > On Wed, Oct 29, 2014 at 03:12:43PM -0700, Kenneth Graunke wrote: > > > > > > > Haswell significantly improved the performance of sampler_c > > messages, > > > > > > > but the optimization appears to be off by default. Later platforms > > > > > > > remove this bit, and apparently always enable the optimization. > > > > > > > > > > > > > > Improves performance in "Counter Strike: Global Offensive" by 18% > > > > > > > at default settings on Iris Pro. No Piglit regressions. > > > > > > > > > > > > Nice. We need more bits like this ;) > > > > > > > > > > > > > > > > > > > > Signed-off-by: Kenneth Graunke <kenneth@xxxxxxxxxxxxx> > > > > > > > --- > > > > > > > drivers/gpu/drm/i915/i915_reg.h | 1 + > > > > > > > drivers/gpu/drm/i915/intel_pm.c | 4 ++++ > > > > > > > 2 files changed, 5 insertions(+) > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/i915_reg.h > > > > b/drivers/gpu/drm/i915/i915_reg.h > > > > > > > index 77fce96..340821a 100644 > > > > > > > --- a/drivers/gpu/drm/i915/i915_reg.h > > > > > > > +++ b/drivers/gpu/drm/i915/i915_reg.h > > > > > > > @@ -5952,6 +5952,7 @@ enum punit_power_well { > > > > > > > #define HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE (1 << 6) > > > > > > > > > > > > > > #define HALF_SLICE_CHICKEN3 0xe184 > > > > > > > +#define HSW_SAMPLE_C_PERFORMANCE (1<<9) > > > > > > > #define GEN8_CENTROID_PIXEL_OPT_DIS (1<<8) > > > > > > > #define GEN8_SAMPLER_POWER_BYPASS_DIS (1<<1) > > > > > > > > > > > > > > diff --git a/drivers/gpu/drm/i915/intel_pm.c > > > > b/drivers/gpu/drm/i915/intel_pm.c > > > > > > > index 7a69eba..50c72a7 100644 > > > > > > > --- a/drivers/gpu/drm/i915/intel_pm.c > > > > > > > +++ b/drivers/gpu/drm/i915/intel_pm.c > > > > > > > @@ -5736,6 +5736,10 @@ static void haswell_init_clock_gating(struct > > > > drm_device *dev) > > > > > > > I915_WRITE(GEN7_GT_MODE, > > > > > > > GEN6_WIZ_HASHING_MASK | GEN6_WIZ_HASHING_16x4); > > > > > > > > > > > > > > + /* Make sample_c messages faster. */ > > > > > > > > > > > > I found a name for it in the w/a database. > > > > > > > > > > > > WaSampleCChickenBitEnable:hsw > > > > > > > > > > > > Reviewed-by: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > > > > > > > > > > Oh actually it says palette won't work when this bit is on. I'm assuming > > > > > that's the texture palette. Do we have any use of that anywhere? > > > > > > > > That's a good point. 3DSTATE_SAMPLER_PALETTE_LOAD and the A8P8/indexed > > > > formats aren't used by Mesa or xf86-video-intel, but it looks like they > > might > > > > be used by libva. > > > > > > > > Can someone confirm that libva does use the sampler palette? > > > > > > > > If they do, what do we do about it? > > > > > > I suppose the best option then would be to use an LRI from a batch, > > > which means the register would need to be added to the cmd parser > > > white list. This is one of the context saved registers so doing the > > > LRI just once per context should be enough. > > > > I don't like that solution. For one, it's impossible - you can't LRI from > > userspace batches, even if you add it to the kernel command parser's > > whitelist, because the hardware scanner is still enabled. Given that I've > > been waiting two years for this capability, I want to find a more immediate > > solution. > > Ah. I've somehow convinced myself the cmd parser might actually be doing > something besides just eating CPU cycles these days. But I guess not. > > > > > Another option is to have some sort of execbuf flag...maybe a 3D/Media "usage" > > flag. If set to 3D, write 0x6000200...if media, write 0x6000000. Or > > something specific. I do hate adding more junk to the execbuf path, though. > > > > Other ideas? > > Fast vs. slow flag? :) > > More seriously, one somewhat crappy option would be to initialize that > bit to 1 for all explicit contexts, and then have the kernel always turn > it off before executing something with the default context. It's not > unlike how we imagined the RS stuff would work since old userspace > doesn't know to turn RS off when using the default context. Interesting idea - that might work. We don't need mid-batch changes either. I don't think HALF_SLICE_CHICKEN3 is part of the logical context, FWIW. Before we get too much further...we should check if libva is actually broken. I don't know if this means the sampler palette completely doesn't work, or if it just means sample_c doesn't work with the palette. If it's the latter, we're probably fine, because I doubt libva uses sample_c. --Ken
Attachment:
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx