On Tue, Aug 06, 2013 at 05:24:25PM +0300, Ville Syrjälä wrote: > On Tue, Aug 06, 2013 at 01:17:04PM +0100, Chris Wilson wrote: > > +static bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj) > > +{ > > + if (!cpu_cache_is_coherent(obj->base.dev, obj->cache_level)) > > + return true; > > + > > + return atomic_read(&obj->fb_count) > 0; > > +} > > I was thinking a bit about this fb_count thing. The other option would > be to track actual scanout activity, which wouldn't be hard to do, but > I guess that could shift a bunch of the clflush cost to page flip time. > So maybe we don't want that? Different series, see the frontbuffer write tracking. As for page flip, that is currently the only time we clflush... > Also i915_gem_sw_finish_ioctl() still looks at pin_count as an > indication whether scanout is happening. We should add an fb_count > check there as well. Maybe we should leave the pin_count check there, > so that framebuffers that aren't being scanned out currently can be > massaged a bit more freely before a clflush needs to be issued? It does do a fb_count check, the pin_count + fb_count is a really strong indicator of active scanout. But sw_finish is truly legacy, so it just has to work and be as simple as possible. > > - if (obj->cache_level == I915_CACHE_NONE && > > - obj->tiling_mode == I915_TILING_NONE && > > - obj->base.write_domain != I915_GEM_DOMAIN_CPU) { > > + if (obj->tiling_mode == I915_TILING_NONE && > > + !cpu_write_needs_clflush(obj)) { > > This would break non-LLC platforms, I think. Before, GTT writes were > used for non-snooped buffers w/ clean CPU caches (write domain != cpu), > which makes sense since access via GTT doesn't snoop ever according > to the docs. Now I think it'll do GTT writes for snooped non-scanout > buffers. D'oh. What I was aiming for was to always use shmem writes on snooped bo, irrespective of whether it is GPU hot. s/!cpu_write_needs_clflush/cpu_write_needs_clflush/ > > > ret = i915_gem_gtt_pwrite_fast(dev, obj, args, file); > > /* Note that the gtt paths might fail with non-page-backed user > > * pointers (e.g. gtt mappings when moving data between > > @@ -3299,7 +3305,7 @@ i915_gem_clflush_object(struct drm_i915_gem_object *obj) > > * snooping behaviour occurs naturally as the result of our domain > > * tracking. > > */ > > - if (obj->cache_level != I915_CACHE_NONE) > > + if (!cpu_write_needs_clflush(obj)) > > return; > > I would actually prefer to kill this check completely, and move the > decision whether to do a clflush to the callers. I saw, I differ in opinion. > There are three places that I spotted which lack such a check; > i915_gem_execbuffer_move_to_gpu() and i915_gem_object_set_to_cpu_domain() > need to flush only when !cpu_cache_is_coherent(), whereas > i915_gem_object_flush_cpu_write_domain() needs to flush for scanout as > well. I like the approach of trying to keep as much of the complexity inside i915_gem.c and exporting simple rules to the callers. > > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c > > index 0584480..cc5eaba 100644 > > --- a/drivers/gpu/drm/i915/intel_display.c > > +++ b/drivers/gpu/drm/i915/intel_display.c > > @@ -9443,6 +9443,8 @@ static void intel_user_framebuffer_destroy(struct drm_framebuffer *fb) > > struct intel_framebuffer *intel_fb = to_intel_framebuffer(fb); > > > > drm_framebuffer_cleanup(fb); > > + > > + atomic_dec(&intel_fb->obj->fb_count); > > Where did the the other atomic_dec() go (was in intel_fbdev_destroy())? Missed. I blame the lack of intel_framebuffer_fini(), that would be a better patch first. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx