[PATCH] drm/i915: GFDT support for SNB/IVB

ville.syrjala at linux.intel.com (Ville Syrjälä) · Mon, 4 Mar 2013 16:03:45 +0200

On Sun, Mar 03, 2013 at 05:28:52PM +0100, Daniel Vetter wrote:
> On Fri, Mar 01, 2013 at 08:32:57PM +0200, ville.syrjala at linux.intel.com wrote:
> > From: Ville Syrj?l? <ville.syrjala at linux.intel.com>
> > 
> > Currently all scanout buffers must be uncached because the
> > display controller doesn't snoop the LLC. SNB introduced another
> > method to guarantee coherency for the display controller. It's
> > called the GFDT or graphics data type.
> > 
> > Pages that have the GFDT bit enabled in their PTEs get flushed
> > all the way to memory when a MI_FLUSH_DW or PIPE_CONTROL is
> > issued with the "synchronize GFDT" bit set.
> > 
> > So rather than making all scanout buffers uncached, set the GFDT
> > bit in their PTEs, and modify the ring flush functions to enable
> > the "synchronize GFDT" bit.
> > 
> > On HSW the GFDT bit was removed from the PTE, and it's only present in
> > surface state, so we can't really set it from the kernel. Also the docs
> > state that the hardware isn't actually guaranteed to respect the GFDT
> > bit. So it looks like GFDT isn't all that useful on HSW.
> > 
> > So far I've tried this very quickly on an IVB machine, and
> > it seems to be working as advertised. No idea if it does any
> > good though.
> > 
> > TODO:
> > - make sure there are no missing flushes (CPU access doesn't
> >   respect GFDT after all).
> > - measure it and see if there's some real benefit
> > - maybe we can track whether "synchronize GFDT" needs to be
> >   issued, and skip it when possible. needs some numbers to
> >   determine if it's worthwile.
> > 
> > Signed-off-by: Ville Syrj?l? <ville.syrjala at linux.intel.com>
> 
> Iirc when I've tried this out a while back it regressed a few benchmarks.
> Chris&me suspected cache trahsing, but hard to tell without proper
> instrumentation support. Chris played around with a few tricks to mark
> other giant bos as uncacheable, but he couldn't find any improved
> workloads.

I see. I didn't realize this was tried already. Not that I really
planned to implement this in the first place. I was just studying the
code and figured I'd learn better by trying to change things a bit
;) But if there's interest I can of course try to improve it further.

> In short I think this needs to come with decent amounts of numbers
> attached, like the TODO says ;-)
> 
> The other thing was that I didn't manage to get things to work properly,
> leaving some random cachline dirt on the screen. Looking at your code, you
> add the gfdt flush to every ring_flush, whereas I've tried to be clever
> and only flushed after batches rendering to the frontbuffer. So probably a
> bug in my code, or a flush on a given ring doesn't flush out caches for
> one of the other engines.

I had a bug in my first attempt where I forgot the initial clflush. That
left some random crap on the screen until I rendered the next frame w/
GFDT already enabled. Other than that I didn't see any corruption except
when I intentionally left out the flushes. But as stated my code does
too many flushes probably.

-- 
Ville Syrj?l?
Intel OTC