Francisco Jerez <currojerez@xxxxxxxxxx> writes: > Daniel Vetter <daniel@xxxxxxxx> writes: > >> On Wed, Jan 11, 2017 at 12:24:59PM +0000, Chris Wilson wrote: >>> On Wed, Jan 11, 2017 at 02:07:37PM +0200, Mika Kuoppala wrote: >>> > Daniel Vetter <daniel@xxxxxxxx> writes: >>> > >>> > > On Mon, Jan 09, 2017 at 01:07:56PM -0800, Francisco Jerez wrote: >>> > >> The WaDisableLSQCROPERFforOCL workaround has the side effect of >>> > >> disabling an L3SQ optimization that has huge performance implications >>> > >> and is unlikely to be necessary for the correct functioning of usual >>> > >> graphic workloads. Userspace is free to re-enable the workaround on >>> > >> demand, and is generally in a better position to determine whether the >>> > >> workaround is necessary than the DRM is (e.g. only during the >>> > >> execution of compute kernels that rely on both L3 fences and HDC R/W >>> > >> requests). >>> > >> >>> > >> The same workaround seems to apply to BDW (at least to production >>> > >> stepping G1) and SKL as well (the internal workaround database claims >>> > >> that it does for all steppings, while the BSpec workaround table only >>> > >> mentions pre-production steppings), but the DRM doesn't do anything >>> > >> beyond whitelisting the L3SQCREG4 register so userspace can enable it >>> > >> when it sees fit. Do the same on KBL platforms. >>> > >> >>> > >> Improves performance of the GFXBench4 gl_manhattan31 benchmark by 60%, >>> > >> and gl_4 (AKA car chase) by 14% on a KBL GT2 running Mesa master -- >>> > >> This is followed by a regression of 35% and 10% respectively for the >>> > >> same benchmarks and platform caused by my recent patch series >>> > >> switching userspace to use the dataport constant cache instead of the >>> > >> sampler to implement uniform pull constant loads, which caused us to >>> > >> hit more heavily the L3 cache (and on platforms other than KBL had the >>> > >> opposite effect of improving performance of the same two benchmarks). >>> > >> The overall effect on KBL of this change combined with the recent >>> > >> userspace change is respectively 4.6% and 2.6%. SynMark2 OglShMapPcf >>> > >> was affected by the constant cache changes (though it improved as it >>> > >> did on other platforms rather than regressing), but is not >>> > >> significantly affected by this patch (with statistical significance of >>> > >> 5% and sample size 20). >>> > >> >>> > >> v2: Drop some more code to avoid unused variable warning. >>> > >> >>> > >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=99256 >>> > >> Signed-off-by: Francisco Jerez <currojerez@xxxxxxxxxx> >>> > >> Cc: Eero Tamminen <eero.t.tamminen@xxxxxxxxx> >>> > >> Cc: Jani Nikula <jani.nikula@xxxxxxxxx> >>> > >> Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> >>> > >> Cc: beignet@xxxxxxxxxxxxxxxxxxxxx >>> > > >>> > > Don't we need some userspace flag/opt-in scheme to avoid stuff going boom >>> > > for compute kernels? Are the patches for mesa compute/beignet >>> > > ready&reviewed? >>> > >>> > This is explicit setting on kbl/E0 only. So one could argue >>> > that unless they filter based on PCI-IDs, things would already >>> > blow up across the skl/kbl population, if they forgot >>> > to set it. The whitelisting is in place and looks sane >>> > so this E0 exception is a wart that got in by me reading wa >>> > database slavishly without thinking. >>> >>> Add Fixes then? >> >> Yeah, cc: stable would be good to make sure it shows up in all supported >> kernels, fast. Otherwise we'll get some good wtf bug reports. > > Agreed -- It would be nice for this to get to stable kernel branches. > Added Fixes and stable tags and pushed to drm-intel-next-queued. Thanks for patch, -Mika >> -Daniel >> -- >> Daniel Vetter >> Software Engineer, Intel Corporation >> http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx