On Wed, Mar 06, 2013 at 04:28:09PM +0000, Chris Wilson wrote: > From: Ville Syrj?l? <ville.syrjala at linux.intel.com> > > Currently all scanout buffers must be uncached because the > display controller doesn't snoop the LLC. SNB introduced another > method to guarantee coherency for the display controller. It's > called the GFDT or graphics data type. > > Pages that have the GFDT bit enabled in their PTEs get flushed > all the way to memory when a MI_FLUSH_DW or PIPE_CONTROL is > issued with the "synchronize GFDT" bit set. > > So rather than making all scanout buffers uncached, set the GFDT > bit in their PTEs, and modify the ring flush functions to enable > the "synchronize GFDT" bit. > > On HSW the GFDT bit was removed from the PTE, and it's only present in > surface state, so we can't really set it from the kernel. Also the docs > state that the hardware isn't actually guaranteed to respect the GFDT > bit. So it looks like GFDT isn't all that useful on HSW. > > So far I've tried this very quickly on an IVB machine, and > it seems to be working as advertised. No idea if it does any > good though. > > On an i5-2520m (laptop) running gnome-shell at 1366x768: > padman 140.78 -> 145.98 fps > openarena 183.72 -> 186.87 fps > gtkperf ComboBoxEntry 20.27 -> 22.14s > gtkperf pixbuf 1.12 -> 1.47s > x11perf -aa10text 13.40 -> 13.20 Mglyphs > which are well within the throttling noise. > > v2 [ickle]: adapt to comply with existing userspace guarantees > > Signed-off-by: Ville Syrj?l? <ville.syrjala at linux.intel.com> > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk> Oops, this one here somehow fell a bit through the cracks. Two bikesheds and one real issue: - s/bool gfdt/unsigned caching_flags/ for the gtt pte punchers. I expect more fun to come here ;-) - ring->flush has a pile of arguments, I guess we should coalesce invalidate/flush and the new internal into a simple flags array. I don't expect that we'll every switch invalidate/flush back to domain arrays from the current "it's just a bool, really" usage. Also, the above should be done in separate prep patches imo. The real deal is flushing cpu access, since that probably does not set the gfdt flag. So cpu reads are fine and don't require any flushing, but cpu writes must be clflushed I think. In a way gfdt works as if the gpu is doing write-through caching (safe that we have to manually initiate the write-through with the gfdt flush). But from the pov of cpu access it's the same, and hence requires the same amount of clflushing. Hence I'm leaning towards adding a new I915_CACHING_WT cache_level. Ofc for gfdt we still need to keep track of the gfdt_dirty state, but all the cpu side flushing business should be much clearer. Stumbled over this while checking whether sw_finish_ioctl would still do the right thing, and noticed that the clflush is a noop when gfdt is treated like the current CACHE_LLC. Cheers, Daniel PS: Ben Widawsky noticed that on ivb we set a stupid ppgtt pte caching override, which results in pte cachelines being tagged with gfdt. Hsw has an equivalent mode to allow the gpu to write through. I have no idea what's the point of this and we never write ptes from the gpu. So can you please throw the relevant patch on top to disable gfdt for ppgtt ptes in ECOCHK? -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch