On Mon, Aug 05, 2013 at 11:13:24PM +0300, ville.syrjala@xxxxxxxxxxxxxxx wrote: > From: Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> > > A bit more food for the cache discussions. My idea here is to track > whether an object is part of any fb, and in case it is we assume it > can be used for scanout, and thus may need some extra clflushes. > But otherwise we'd try to avoid clflushes on LLC plarforms. > > I also included some GFDT stuff in there just to think a bit about how > it would work, but I left out the GFDT flushes themselves. > > Also I didn't even boot-test this, so it may explode quite > spectacularly. > --- > drivers/gpu/drm/i915/i915_drv.h | 5 +++ > drivers/gpu/drm/i915/i915_gem.c | 63 +++++++++++++++++++++++------- > drivers/gpu/drm/i915/i915_gem_execbuffer.c | 13 +++++- > drivers/gpu/drm/i915/i915_gem_gtt.c | 6 ++- > drivers/gpu/drm/i915/intel_display.c | 2 + > drivers/gpu/drm/i915/intel_fb.c | 1 + > 6 files changed, 72 insertions(+), 18 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 8b14e22..ec275ed 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -451,6 +451,8 @@ enum i915_cache_level { > I915_CACHE_NONE = 0, > I915_CACHE_LLC, > I915_CACHE_LLC_MLC, /* gen6+, in docs at least! */ > + I915_CACHE_WT, /* hsw gt3e */ > + I915_CACHE_GFDT = 0x4, /* SNB/IVB: flag ORed w/ the cache mode */ > }; > > typedef uint32_t gen6_gtt_pte_t; > @@ -1388,6 +1390,8 @@ struct drm_i915_gem_object { > > /** for phy allocated objects */ > struct drm_i915_gem_phys_object *phys_obj; > + > + atomic_t fb_count; > }; > #define to_gem_object(obj) (&((struct drm_i915_gem_object *)(obj))->base) > > @@ -1494,6 +1498,7 @@ struct drm_i915_file_private { > #define HAS_VEBOX(dev) (INTEL_INFO(dev)->has_vebox_ring) > #define HAS_LLC(dev) (INTEL_INFO(dev)->has_llc) > #define I915_NEED_GFX_HWS(dev) (INTEL_INFO(dev)->need_gfx_hws) > +#define HAS_GFDT(dev) (0) > > #define HAS_HW_CONTEXTS(dev) (INTEL_INFO(dev)->gen >= 6) > #define HAS_ALIASING_PPGTT(dev) (INTEL_INFO(dev)->gen >=6 && !IS_VALLEYVIEW(dev)) > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index bd6eb64..38d3241 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -394,6 +394,20 @@ shmem_pread_slow(struct page *page, int shmem_page_offset, int page_length, > return ret ? - EFAULT : 0; > } > > +static bool cpu_caches_coherent(struct drm_i915_gem_object *obj) > +{ > + return HAS_LLC(obj->base.dev) || obj->cache_level != I915_CACHE_NONE; > +} Yes, that much better expresses what we mean. > +static bool > +scanout_needs_clflush(struct drm_i915_gem_object *obj, > + enum i915_cache_level cache_level) > +{ > + return cache_level != I915_CACHE_LLC && > + cache_level != I915_CACHE_LLC_MLC && > + atomic_read(&obj->fb_count) > 0; > +} (Aside, this also a good time to change LLC_MLC to L3!) GFDT, similarly to WT, I think is purely a bit set on the GPU side. My experience was that I could not dirty with the CPU then issue a GPU GFDT flush without seeing cache dirt on the scanout. Actually, I think I just found my issue in the SNB bspec... Apparently, I must not dirty the surface before the previous flush is complete - which is solvable by using the active render tracking. Just another little hoop to jump through for GFDT. I think we really want to see the llc/scanout patch first so that we can give it a thorough test and debunking. Then WT as it is fairly trivial and then seeing if we can finally get GFDT right. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx