According to bspec, ROW_CHICKEN3's bit 6 which is to disable move of cacheable global atomics to L3 is needed for GT3 D stepping. I enabled it and tested it with HSW GT2 D stepping and GT3 E stepping. The atomics works fine in beignet. And it could get more than 10x performance improvement with some workload, for an example, the "splat" kernel in darktable, without this patch, it consumes 50 seconds in one large raw picture processing. But with this patch, the same process only takes less than 1 second. Signed-off-by: Zhigang Gong <zhigang.gong@xxxxxxxxx> --- drivers/gpu/drm/i915/intel_pm.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c index 7d99a9c..8a27802 100644 --- a/drivers/gpu/drm/i915/intel_pm.c +++ b/drivers/gpu/drm/i915/intel_pm.c @@ -5938,10 +5938,12 @@ static void haswell_init_clock_gating(struct drm_device *dev) ilk_init_lp_watermarks(dev); - /* L3 caching of data atomics doesn't work -- disable it. */ - I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE); - I915_WRITE(HSW_ROW_CHICKEN3, - _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE)); + if (IS_HSW_GT3(dev) && dev->pdev->revision <= 6) { + /* L3 caching of data atomics doesn't work -- disable it. */ + I915_WRITE(HSW_SCRATCH1, HSW_SCRATCH1_L3_DATA_ATOMICS_DISABLE); + I915_WRITE(HSW_ROW_CHICKEN3, + _MASKED_BIT_ENABLE(HSW_ROW_CHICKEN3_L3_GLOBAL_ATOMICS_DISABLE)); + } /* This is required by WaCatErrorRejectionIssue:hsw */ I915_WRITE(GEN7_SQ_CHICKEN_MBCUNIT_CONFIG, -- 1.8.3.2 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx