On Mon, Apr 08, 2013 at 07:57:40AM -0700, Kenneth Graunke wrote: > On 04/08/2013 06:28 AM, Chris Wilson wrote: > >Enabling context support increases SwapBuffers latency by about 20% > >(measured on an i7-3720qm). We can offset that loss slightly by enabling > >faster caching for the contexts. As they are not backed by any > >particular cache (such as the sampler or render caches) our only option > >is to select the generic mid-level cache. This reduces the latency of > >the swap by about 5%. > > > >Oddly this effect can be observed running smokin-guns on IVB at > >1280x1024: > >Using BLT copies for swaps: 151.67 fps > >Using Render copies for swaps (unpatched): 141.70 fps > >With contexts disabled: 150.23 fps > >With contexts in L3$: 150.77 fps > > > >Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk> > >Cc: Ben Widawsky <ben at bwidawsk.net> > >Cc: Kenneth Graunke <kenneth at whitecape.org> > >--- > > drivers/gpu/drm/i915/i915_gem_context.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > >diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > >index 94d873a..a1e8ecb 100644 > >--- a/drivers/gpu/drm/i915/i915_gem_context.c > >+++ b/drivers/gpu/drm/i915/i915_gem_context.c > >@@ -152,6 +152,13 @@ create_hw_context(struct drm_device *dev, > > return ERR_PTR(-ENOMEM); > > } > > > >+ if (INTEL_INFO(dev)->gen >= 7) { > >+ ret = i915_gem_object_set_cache_level(ctx->obj, > >+ I915_CACHE_LLC_MLC); > >+ if (ret) > >+ goto err_out; > >+ } > >+ > > /* The ring associated with the context object is handled by the normal > > * object tracking code. We give an initial ring value simple to pass an > > * assertion in the context switch code. > > Sounds good to me, Chris. Is this also useful on Sandybridge? I > don't see why we couldn't use both L3 & LLC there as well. Snb doesn't have gpu l3 afaik. > > Reviewed-by: Kenneth Graunke <kenneth at whitecape.org> > > On a tangent, it would be nice to clean up the I915_CACHE_* enums. > The term "MLC" isn't really used in the modern documentation, and > there's even text that says "Ivybridge doesn't have MLC". But what > it really means here is LLC + L3. It's probably not worth reworking > until we can send out the new Haswell bits, though. Yeah, my idea is that iternally we split this into cache_level + flags, where cache_level controls how coherency works (i.e. where we have to clflush and do similar magic) and the flags do the fine-tuning (like enabling l3 caching). But like you've said, can wait for more hsw magic (or gfdt) landing. Queued for -next, thanks for the patch. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch