Quoting Chris Wilson (2019-07-18 10:03:34) > Since userspace has the ability to bypass the CPU cache from within its > unpriviledged command stream, we have to flush the CPU cache to memory > in order to overwrite the previous contents on creation. > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > Cc: stablevger.kernel.org > --- > drivers/gpu/drm/i915/gem/i915_gem_shmem.c | 26 ++++++----------------- > 1 file changed, 7 insertions(+), 19 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c > index d2a1158868e7..f752b326d399 100644 > --- a/drivers/gpu/drm/i915/gem/i915_gem_shmem.c > +++ b/drivers/gpu/drm/i915/gem/i915_gem_shmem.c > @@ -459,7 +459,6 @@ i915_gem_object_create_shmem(struct drm_i915_private *i915, u64 size) > { > struct drm_i915_gem_object *obj; > struct address_space *mapping; > - unsigned int cache_level; > gfp_t mask; > int ret; > > @@ -498,24 +497,13 @@ i915_gem_object_create_shmem(struct drm_i915_private *i915, u64 size) > obj->write_domain = I915_GEM_DOMAIN_CPU; > obj->read_domains = I915_GEM_DOMAIN_CPU; > > - if (HAS_LLC(i915)) > - /* On some devices, we can have the GPU use the LLC (the CPU > - * cache) for about a 10% performance improvement > - * compared to uncached. Graphics requests other than > - * display scanout are coherent with the CPU in > - * accessing this cache. This means in this mode we > - * don't need to clflush on the CPU side, and on the > - * GPU side we only need to flush internal caches to > - * get data visible to the CPU. > - * > - * However, we maintain the display planes as UC, and so > - * need to rebind when first used as such. > - */ > - cache_level = I915_CACHE_LLC; > - else > - cache_level = I915_CACHE_NONE; > - > - i915_gem_object_set_cache_coherency(obj, cache_level); > + /* > + * Note that userspace has control over cache-bypass > + * via its command stream, so even on LLC architectures > + * we have to flush out the CPU cache to memory to > + * clear previous contents. > + */ > + i915_gem_object_set_cache_coherency(obj, I915_CACHE_NONE); An alternative would be to do a GPU clear, but that requires some confidence that the first access will from the GPU (or else we pay the extra latency). Do I hear a request for placement flags in the extended create_ioctl? -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx