On to, 2016-06-09 at 12:29 +0100, Chris Wilson wrote: > If we quickly switch from writing through the GTT to a read of the > physical page directly with the CPU (e.g. performing relocations through > the GTT and then running the command parser), we can observe that the > writes are not visible to the CPU. It is not a coherency problem, as > extensive investigations with clflush have demonstrated, but a mere > timing issue - we have to wait for the GTT to complete it's write before > we start our read from the CPU. > > The issue can be illustrated in userspace with: > > gtt = gem_mmap__gtt(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE); > cpu = gem_mmap__cpu(fd, handle, 0, OBJECT_SIZE, PROT_READ | PROT_WRITE); > gem_set_domain(fd, handle, I915_GEM_DOMAIN_GTT, I915_GEM_DOMAIN_GTT); > > for (i = 0; i < OBJECT_SIZE / 64; i++) { > int x = 16*i + (i%16); > gtt[x] = i; > clflush(&cpu[x], sizeof(cpu[x])); > assert(cpu[x] == i); > } > > Experimenting with that shows that this behaviour is indeed limited to > recent Atom-class hardware. > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_gem.c | 12 +++++++++++- > 1 file changed, 11 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 18b4a684ddde..ffe3d3e9d69d 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -2898,20 +2898,30 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj, > static void > i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj) > { > + struct drm_i915_private *dev_priv = to_i915(obj->base.dev); > uint32_t old_write_domain; > > if (obj->base.write_domain != I915_GEM_DOMAIN_GTT) > return; > > /* No actual flushing is required for the GTT write domain. Writes > - * to it immediately go to main memory as far as we know, so there's > + * to it "immediately" go to main memory as far as we know, so there's > * no chipset flush. It also doesn't land in render cache. > * > * However, we do have to enforce the order so that all writes through > * the GTT land before any writes to the device, such as updates to > * the GATT itself. > + * > + * We also have to wait a bit for the writes to land from the GTT. > + * An uncached read (i.e. mmio) seems to be ideal for the round-trip > + * timing. This issue has only been observed when switching quickly > + * between GTT writes and CPU reads from inside the kernel on recent hw, > + * and it appears to only affect discrete GTT blocks (i.e. on LLC > + * system agents we cannot reproduce this behaviour). This screams for a Tested-by: tag before merging... > */ > wmb(); > + if (INTEL_INFO(dev_priv)->gen >= 6 && !HAS_LLC(dev_priv)) INTEL_GEN() This fixed, and adding the Testcase: label Reviewed-by: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > + POSTING_READ(RING_ACTHD(dev_priv->engine[RCS].mmio_base)); > > old_write_domain = obj->base.write_domain; > obj->base.write_domain = 0; -- Joonas Lahtinen Open Source Technology Center Intel Corporation _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx