On Thu, Apr 04, 2013 at 02:27:12PM +0300, Ville Syrj?l? wrote: > On Thu, Apr 04, 2013 at 11:01:02AM +0200, Daniel Vetter wrote: > > On Wed, Mar 06, 2013 at 04:28:09PM +0000, Chris Wilson wrote: > > > From: Ville Syrj?l? <ville.syrjala at linux.intel.com> > > > > > > Currently all scanout buffers must be uncached because the > > > display controller doesn't snoop the LLC. SNB introduced another > > > method to guarantee coherency for the display controller. It's > > > called the GFDT or graphics data type. > > > > > > Pages that have the GFDT bit enabled in their PTEs get flushed > > > all the way to memory when a MI_FLUSH_DW or PIPE_CONTROL is > > > issued with the "synchronize GFDT" bit set. > > > > > > So rather than making all scanout buffers uncached, set the GFDT > > > bit in their PTEs, and modify the ring flush functions to enable > > > the "synchronize GFDT" bit. > > > > > > On HSW the GFDT bit was removed from the PTE, and it's only present in > > > surface state, so we can't really set it from the kernel. Also the docs > > > state that the hardware isn't actually guaranteed to respect the GFDT > > > bit. So it looks like GFDT isn't all that useful on HSW. > > > > > > So far I've tried this very quickly on an IVB machine, and > > > it seems to be working as advertised. No idea if it does any > > > good though. > > > > > > On an i5-2520m (laptop) running gnome-shell at 1366x768: > > > padman 140.78 -> 145.98 fps > > > openarena 183.72 -> 186.87 fps > > > gtkperf ComboBoxEntry 20.27 -> 22.14s > > > gtkperf pixbuf 1.12 -> 1.47s > > > x11perf -aa10text 13.40 -> 13.20 Mglyphs > > > which are well within the throttling noise. > > > > > > v2 [ickle]: adapt to comply with existing userspace guarantees > > > > > > Signed-off-by: Ville Syrj?l? <ville.syrjala at linux.intel.com> > > > Signed-off-by: Chris Wilson <chris at chris-wilson.co.uk> > > > > Oops, this one here somehow fell a bit through the cracks. > > > > Two bikesheds and one real issue: > > - s/bool gfdt/unsigned caching_flags/ for the gtt pte punchers. I expect > > more fun to come here ;-) > > - ring->flush has a pile of arguments, I guess we should coalesce > > invalidate/flush and the new internal into a simple flags array. I don't > > expect that we'll every switch invalidate/flush back to domain arrays > > from the current "it's just a bool, really" usage. > > > > Also, the above should be done in separate prep patches imo. > > > > The real deal is flushing cpu access, since that probably does not set the > > gfdt flag. So cpu reads are fine and don't require any flushing, but cpu > > writes must be clflushed I think. In a way gfdt works as if the gpu is > > doing write-through caching (safe that we have to manually initiate the > > write-through with the gfdt flush). But from the pov of cpu access it's > > the same, and hence requires the same amount of clflushing. > > > > Hence I'm leaning towards adding a new I915_CACHING_WT cache_level. Ofc > > for gfdt we still need to keep track of the gfdt_dirty state, but all the > > cpu side flushing business should be much clearer. > > > > Stumbled over this while checking whether sw_finish_ioctl would still do > > the right thing, and noticed that the clflush is a noop when gfdt is > > treated like the current CACHE_LLC. > > My original patch had a change to clflushing where pinned objects w/ gfdt > were also flushed. I didn't really read v2 well enough to figure out why > that got dropped. Because it was the wrong approach. The API where the scanout is to be made coherent with CPU mmaps is sw_finish. The only user of that is early pre-gen5 DDX. So imo the original patch was abusing a generic flag and applying the flushes at the wrong points, and I simply didn't care about badly designed API such as sw_finish. -Chris -- Chris Wilson, Intel Open Source Technology Centre