Re: [PATCHv2] drm/i915: Remove WaDisableLSQCROPERFforOCL KBL workaround.

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Wed, 11 Jan 2017 12:24:59 +0000



On Wed, Jan 11, 2017 at 02:07:37PM +0200, Mika Kuoppala wrote:
> Daniel Vetter <daniel@xxxxxxxx> writes:
> 
> > On Mon, Jan 09, 2017 at 01:07:56PM -0800, Francisco Jerez wrote:
> >> The WaDisableLSQCROPERFforOCL workaround has the side effect of
> >> disabling an L3SQ optimization that has huge performance implications
> >> and is unlikely to be necessary for the correct functioning of usual
> >> graphic workloads.  Userspace is free to re-enable the workaround on
> >> demand, and is generally in a better position to determine whether the
> >> workaround is necessary than the DRM is (e.g. only during the
> >> execution of compute kernels that rely on both L3 fences and HDC R/W
> >> requests).
> >> 
> >> The same workaround seems to apply to BDW (at least to production
> >> stepping G1) and SKL as well (the internal workaround database claims
> >> that it does for all steppings, while the BSpec workaround table only
> >> mentions pre-production steppings), but the DRM doesn't do anything
> >> beyond whitelisting the L3SQCREG4 register so userspace can enable it
> >> when it sees fit.  Do the same on KBL platforms.
> >> 
> >> Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%,
> >> and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master --
> >> This is followed by a regression of 35% and 10% respectively for the
> >> same benchmarks and platform caused by my recent patch series
> >> switching userspace to use the dataport constant cache instead of the
> >> sampler to implement uniform pull constant loads, which caused us to
> >> hit more heavily the L3 cache (and on platforms other than KBL had the
> >> opposite effect of improving performance of the same two benchmarks).
> >> The overall effect on KBL of this change combined with the recent
> >> userspace change is respectively 4.6% and 2.6%.  SynMark2 OglShMapPcf
> >> was affected by the constant cache changes (though it improved as it
> >> did on other platforms rather than regressing), but is not
> >> significantly affected by this patch (with statistical significance of
> >> 5% and sample size 20).
> >> 
> >> v2: Drop some more code to avoid unused variable warning.
> >> 
> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256
> >> Signed-off-by: Francisco Jerez <currojerez@xxxxxxxxxx>
> >> Cc: Eero Tamminen <eero.t.tamminen@xxxxxxxxx>
> >> Cc: Jani Nikula <jani.nikula@xxxxxxxxx>
> >> Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx>
> >> Cc: beignet@xxxxxxxxxxxxxxxxxxxxx
> >
> > Don't we need some userspace flag/opt-in scheme to avoid stuff going boom
> > for compute kernels? Are the patches for mesa compute/beignet
> > ready&reviewed?
> 
> This is explicit setting on kbl/E0 only. So one could argue
> that unless they filter based on PCI-IDs, things would already
> blow up across the skl/kbl population, if they forgot
> to set it. The whitelisting is in place and looks sane
> so this E0 exception is a wart that got in by me reading wa
> database slavishly without thinking.

Add Fixes then?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx