On Thu, Apr 04, 2013 at 11:01:02AM +0200, Daniel Vetter wrote: > On Wed, Mar 06, 2013 at 04:28:09PM +0000, Chris Wilson wrote: > > From: Ville Syrj?l? <ville.syrjala at linux.intel.com> > > > > Currently all scanout buffers must be uncached because the > > display controller doesn't snoop the LLC. SNB introduced another > > method to guarantee coherency for the display controller. It's > > called the GFDT or graphics data type. > > > > Pages that have the GFDT bit enabled in their PTEs get flushed > > all the way to memory when a MI_FLUSH_DW or PIPE_CONTROL is > > issued with the "synchronize GFDT" bit set. > > > > So rather than making all scanout buffers uncached, set the GFDT > > bit in their PTEs, and modify the ring flush functions to enable > > the "synchronize GFDT" bit. > > > > On HSW the GFDT bit was removed from the PTE, and it's only present in > > surface state, so we can't really set it from the kernel. Also the docs > > state that the hardware isn't actually guaranteed to respect the GFDT > > bit. So it looks like GFDT isn't all that useful on HSW. > > > > So far I've tried this very quickly on an IVB machine, and > > it seems to be working as advertised. No idea if it does any > > good though. > > > > On an i5-2520m (laptop) running gnome-shell at 1366x768: > > padman 140.78 -> 145.98 fps > > openarena 183.72 -> 186.87 fps > > gtkperf ComboBoxEntry 20.27 -> 22.14s > > gtkperf pixbuf 1.12 -> 1.47s > > x11perf -aa10text 13.40 -> 13.20 Mglyphs > > which are well within the throttling noise. > > > > v2 [ickle]: adapt to comply with existing userspace guarantees > > > > Signed-off-by: Ville Syrj?l? <ville.syrjala at linux.intel.com> > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk> > > Oops, this one here somehow fell a bit through the cracks. > > Two bikesheds and one real issue: > - s/bool gfdt/unsigned caching_flags/ for the gtt pte punchers. I expect > more fun to come here ;-) > - ring->flush has a pile of arguments, I guess we should coalesce > invalidate/flush and the new internal into a simple flags array. I don't > expect that we'll every switch invalidate/flush back to domain arrays > from the current "it's just a bool, really" usage. > > Also, the above should be done in separate prep patches imo. > > The real deal is flushing cpu access, since that probably does not set the > gfdt flag. So cpu reads are fine and don't require any flushing, but cpu > writes must be clflushed I think. In a way gfdt works as if the gpu is > doing write-through caching (safe that we have to manually initiate the > write-through with the gfdt flush). But from the pov of cpu access it's > the same, and hence requires the same amount of clflushing. > > Hence I'm leaning towards adding a new I915_CACHING_WT cache_level. Ofc > for gfdt we still need to keep track of the gfdt_dirty state, but all the > cpu side flushing business should be much clearer. > > Stumbled over this while checking whether sw_finish_ioctl would still do > the right thing, and noticed that the clflush is a noop when gfdt is > treated like the current CACHE_LLC. My original patch had a change to clflushing where pinned objects w/ gfdt were also flushed. I didn't really read v2 well enough to figure out why that got dropped. -- Ville Syrj?l? Intel OTC