On Sat, Jun 8, 2013 at 10:41 PM, Carsten Emde <C.Emde at osadl.org> wrote: > On 04/08/2013 11:54 AM, Daniel Vetter wrote: >> >> On Fri, Apr 05, 2013 at 09:03:27AM -0700, Jesse Barnes wrote: >>> >>> On Thu, 4 Apr 2013 21:31:03 +0100 >>> Chris Wilson <chris at chris-wilson.co.uk> wrote: >>> >>>> In order to fully serialize access to the fenced region and the update >>>> to the fence register we need to take extreme measures on SNB+, and >>>> manually flush writes to memory prior to writing the fence register in >>>> conjunction with the memory barriers placed around the register write. >>>> >>>> Fixes i-g-t/gem_fence_thrash >>>> >>>> v2: Bring a bigger gun >>>> v3: Switch the bigger gun for heavier bullets (Arjan van de Ven) >>>> v4: Remove changes for working generations. >>>> v5: Reduce to a per-cpu wbinvd() call prior to updating the fences. >>>> v6: Rewrite comments to ellide forgotten history. >>>> >>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=62191 >>>> Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk> >>>> Cc: Jon Bloomfield <jon.bloomfield at intel.com> >>>> Tested-by: Jon Bloomfield <jon.bloomfield at intel.com> (v2) >>>> Cc: stable at vger.kernel.org >>>> --- >>>> drivers/gpu/drm/i915/i915_gem.c | 28 +++++++++++++++++++++++----- >>>> 1 file changed, 23 insertions(+), 5 deletions(-) >>>> >>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c >>>> b/drivers/gpu/drm/i915/i915_gem.c >>>> index fa4ea1a..8f7739e 100644 >>>> --- a/drivers/gpu/drm/i915/i915_gem.c >>>> +++ b/drivers/gpu/drm/i915/i915_gem.c >>>> @@ -2689,17 +2689,35 @@ static inline int fence_number(struct >>>> drm_i915_private *dev_priv, >>>> return fence - dev_priv->fence_regs; >>>> } >>>> >>>> +static void i915_gem_write_fence__ipi(void *data) >>>> +{ >>>> + wbinvd(); >>>> +} >>>> + >>>> static void i915_gem_object_update_fence(struct drm_i915_gem_object >>>> *obj, >>>> struct drm_i915_fence_reg >>>> *fence, >>>> bool enable) >>>> { >>>> - struct drm_i915_private *dev_priv = obj->base.dev->dev_private; >>>> - int reg = fence_number(dev_priv, fence); >>>> - >>>> - i915_gem_write_fence(obj->base.dev, reg, enable ? obj : NULL); >>>> + struct drm_device *dev = obj->base.dev; >>>> + struct drm_i915_private *dev_priv = dev->dev_private; >>>> + int fence_reg = fence_number(dev_priv, fence); >>>> + >>>> + /* In order to fully serialize access to the fenced region and >>>> + * the update to the fence register we need to take extreme >>>> + * measures on SNB+. In theory, the write to the fence register >>>> + * flushes all memory transactions before, and coupled with the >>>> + * mb() placed around the register write we serialise all memory >>>> + * operations with respect to the changes in the tiler. Yet, on >>>> + * SNB+ we need to take a step further and emit an explicit >>>> wbinvd() >>>> + * on each processor in order to manually flush all memory >>>> + * transactions before updating the fence register. >>>> + */ >>>> + if (HAS_LLC(obj->base.dev)) >>>> + on_each_cpu(i915_gem_write_fence__ipi, NULL, 1); >>>> + i915_gem_write_fence(dev, fence_reg, enable ? obj : NULL); >>>> >>>> if (enable) { >>>> - obj->fence_reg = reg; >>>> + obj->fence_reg = fence_reg; >>>> fence->obj = obj; >>>> list_move_tail(&fence->lru_list, >>>> &dev_priv->mm.fence_list); >>>> } else { >>> >>> >>> I almost wish x86 just gave up and went fully weakly ordered. At least >>> then we'd know we need barriers everywhere, rather than just in >>> mystical places. >>> >>> Reviewed-by: Jesse Barnes <jbarnes at virtuousgeek.org> >> >> >> Queued for -next, thanks for the patch. I am a bit unhappy that the >> testcase doesn't work too well and can't reliably reproduce this on all >> affected machines. But I've tried a bit to improve things and it's very >> fickle. I guess we just have to accept this :( > > > Under real-time conditions when we expect deterministic response to > external and internal events at any time within a couple of > microseconds, invalidating and flushing the entire cache in a running > system is unacceptable. This is introducing latencies of several > milliseconds which was clearly shown in RT regression tests on a kernel > with this patch applied (https://www.osadl.org/?id=1543#c7602). We > therefore have to revert it in the RT patch queue - kind of a > workaround of a workaround. > > Would simply be wonderful, if we could get rid of the hateful wbinvd(). I'm somewhat surprised people have not started to scream earlier about this one ;-) We're trying to figure out whether there's a less costly w/a (we have some benchmark where it kills performance, too) but until that's done we pretty much need to stick with it. If you want to avoid any bad side-effects due to that w/a you can alternatively pin all gpu rendering tasks to the same cpu core/thread. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch