[PATCH] drm/i915: GFDT support for SNB/IVB

chris at chris-wilson.co.uk (Chris Wilson) · Thu, 4 Apr 2013 13:17:44 +0100



On Thu, Apr 04, 2013 at 02:27:12PM +0300, Ville Syrj?l? wrote:
> On Thu, Apr 04, 2013 at 11:01:02AM +0200, Daniel Vetter wrote:
> > On Wed, Mar 06, 2013 at 04:28:09PM +0000, Chris Wilson wrote:
> > > From: Ville Syrj?l? <ville.syrjala at linux.intel.com>
> > > 
> > > Currently all scanout buffers must be uncached because the
> > > display controller doesn't snoop the LLC. SNB introduced another
> > > method to guarantee coherency for the display controller. It's
> > > called the GFDT or graphics data type.
> > > 
> > > Pages that have the GFDT bit enabled in their PTEs get flushed
> > > all the way to memory when a MI_FLUSH_DW or PIPE_CONTROL is
> > > issued with the "synchronize GFDT" bit set.
> > > 
> > > So rather than making all scanout buffers uncached, set the GFDT
> > > bit in their PTEs, and modify the ring flush functions to enable
> > > the "synchronize GFDT" bit.
> > > 
> > > On HSW the GFDT bit was removed from the PTE, and it's only present in
> > > surface state, so we can't really set it from the kernel. Also the docs
> > > state that the hardware isn't actually guaranteed to respect the GFDT
> > > bit. So it looks like GFDT isn't all that useful on HSW.
> > > 
> > > So far I've tried this very quickly on an IVB machine, and
> > > it seems to be working as advertised. No idea if it does any
> > > good though.
> > > 
> > > On an i5-2520m (laptop) running gnome-shell at 1366x768:
> > >   padman 		140.78 -> 145.98 fps
> > >   openarena 		183.72 -> 186.87 fps
> > >   gtkperf ComboBoxEntry	20.27 -> 22.14s
> > >   gtkperf pixbuf	 1.12 ->  1.47s
> > >   x11perf -aa10text	13.40 -> 13.20 Mglyphs
> > > which are well within the throttling noise.
> > > 
> > > v2 [ickle]: adapt to comply with existing userspace guarantees
> > > 
> > > Signed-off-by: Ville Syrj?l? <ville.syrjala at linux.intel.com>
> > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk>
> > 
> > Oops, this one here somehow fell a bit through the cracks.
> > 
> > Two bikesheds and one real issue:
> > - s/bool gfdt/unsigned caching_flags/ for the gtt pte punchers. I expect
> >   more fun to come here ;-)
> > - ring->flush has a pile of arguments, I guess we should coalesce
> >   invalidate/flush and the new internal into a simple flags array. I don't
> >   expect that we'll every switch invalidate/flush back to domain arrays
> >   from the current "it's just a bool, really" usage.
> > 
> > Also, the above should be done in separate prep patches imo.
> > 
> > The real deal is flushing cpu access, since that probably does not set the
> > gfdt flag. So cpu reads are fine and don't require any flushing, but cpu
> > writes must be clflushed I think. In a way gfdt works as if the gpu is
> > doing write-through caching (safe that we have to manually initiate the
> > write-through with the gfdt flush). But from the pov of cpu access it's
> > the same, and hence requires the same amount of clflushing.
> > 
> > Hence I'm leaning towards adding a new I915_CACHING_WT cache_level. Ofc
> > for gfdt we still need to keep track of the gfdt_dirty state, but all the
> > cpu side flushing business should be much clearer.
> > 
> > Stumbled over this while checking whether sw_finish_ioctl would still do
> > the right thing, and noticed that the clflush is a noop when gfdt is
> > treated like the current CACHE_LLC.
> 
> My original patch had a change to clflushing where pinned objects w/ gfdt
> were also flushed. I didn't really read v2 well enough to figure out why
> that got dropped.

Because it was the wrong approach. The API where the scanout is to be
made coherent with CPU mmaps is sw_finish. The only user of that is early
pre-gen5 DDX. So imo the original patch was abusing a generic flag and
applying the flushes at the wrong points, and I simply didn't care about
badly designed API such as sw_finish.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre