On Tue, Aug 06, 2013 at 07:45:40PM +0100, Chris Wilson wrote: > On Tue, Aug 06, 2013 at 09:03:01PM +0300, Ville Syrjälä wrote: > > On Tue, Aug 06, 2013 at 05:16:14PM +0100, Chris Wilson wrote: > > > On Tue, Aug 06, 2013 at 05:24:25PM +0300, Ville Syrjälä wrote: > > > > On Tue, Aug 06, 2013 at 01:17:04PM +0100, Chris Wilson wrote: > > > > > +static bool cpu_write_needs_clflush(struct drm_i915_gem_object *obj) > > > > > +{ > > > > > + if (!cpu_cache_is_coherent(obj->base.dev, obj->cache_level)) > > > > > + return true; > > > > > + > > > > > + return atomic_read(&obj->fb_count) > 0; > > > > > +} > > > > > > > > I was thinking a bit about this fb_count thing. The other option would > > > > be to track actual scanout activity, which wouldn't be hard to do, but > > > > I guess that could shift a bunch of the clflush cost to page flip time. > > > > So maybe we don't want that? > > > > > > Different series, see the frontbuffer write tracking. As for page flip, > > > that is currently the only time we clflush... > > > > > > > Also i915_gem_sw_finish_ioctl() still looks at pin_count as an > > > > indication whether scanout is happening. We should add an fb_count > > > > check there as well. Maybe we should leave the pin_count check there, > > > > so that framebuffers that aren't being scanned out currently can be > > > > massaged a bit more freely before a clflush needs to be issued? > > > > > > It does do a fb_count check, the pin_count + fb_count is a really strong > > > indicator of active scanout. > > > > The fb_count checks is done only through the eventual check in > > i915_gem_clflush_object() which is why I didn't think of it. > > > > On non-LLC platforms it also ends up doing the flush for all > > non-snooped non-scanout buffers, which would get optimized away if the > > fb_count check was where the pin_count check is. > > But it does still do the fb_count check before the flush? What am I > missing here? I believe the code now ends up doing this (on non-LLC): if (pin_count && (cache_level == UC || fb_count > 0) && write_domain == CPU) clflush(); So it'll flush also all pinned non-snooped buffers whether they be scanout or not. We could delay such flushes until the bo gets moved away from the CPU write domain. But maybe it's just a nonsense scenario. Why would someone write to a pinned bo unless the pinning is due to scanout? > > > > > - obj->tiling_mode == I915_TILING_NONE && > > > > > - obj->base.write_domain != I915_GEM_DOMAIN_CPU) { > > > > > + if (obj->tiling_mode == I915_TILING_NONE && > > > > > + !cpu_write_needs_clflush(obj)) { > > > > > > > > This would break non-LLC platforms, I think. Before, GTT writes were > > > > used for non-snooped buffers w/ clean CPU caches (write domain != cpu), > > > > which makes sense since access via GTT doesn't snoop ever according > > > > to the docs. Now I think it'll do GTT writes for snooped non-scanout > > > > buffers. > > > > > > D'oh. What I was aiming for was to always use shmem writes on snooped > > > bo, irrespective of whether it is GPU hot. > > > > > > s/!cpu_write_needs_clflush/cpu_write_needs_clflush/ > > > > So that means a GTT write for anything non-snooped or scanout. If it's > > snooped scanout (I guess we may not really care about such a weird > > combination), or non-snooped but in CPU write domain, that would > > still have issues with possibly dirty data in the CPU caches. > > Hmm, that should be fixed up by i915_gem_gtt_pwrite_fast() as we flush > all the caches before we start overwriting. I've thoroughly confused > myself over which paths do the fancy tricks. :( > > > Might be intersting to try on real hardware what the GTT access > > snooping behaviour really is, especially for some modern non-LLC > > system like VLV. > > I remember why I have the !DOMAIN_CPU test now. If we have already paid > the price to pull it back into the CPU domain, let it rest there until > it is used again. > > So as it stands, I think we want: > > if (obj->tiling_mode == I915_TILING_NONE && > obj->base.write_domain != I915_GEM_DOMAIN_CPU && > cpu_write_needs_clflush(obj)) > i915_gem_gtt_pwrite_fast(); Yeah that looks better to me at least. -- Ville Syrjälä Intel OTC _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx