On Sun, Mar 03, 2013 at 05:28:52PM +0100, Daniel Vetter wrote: > On Fri, Mar 01, 2013 at 08:32:57PM +0200, ville.syrjala at linux.intel.com wrote: > > From: Ville Syrj?l? <ville.syrjala at linux.intel.com> > > > > Currently all scanout buffers must be uncached because the > > display controller doesn't snoop the LLC. SNB introduced another > > method to guarantee coherency for the display controller. It's > > called the GFDT or graphics data type. > > > > Pages that have the GFDT bit enabled in their PTEs get flushed > > all the way to memory when a MI_FLUSH_DW or PIPE_CONTROL is > > issued with the "synchronize GFDT" bit set. > > > > So rather than making all scanout buffers uncached, set the GFDT > > bit in their PTEs, and modify the ring flush functions to enable > > the "synchronize GFDT" bit. > > > > On HSW the GFDT bit was removed from the PTE, and it's only present in > > surface state, so we can't really set it from the kernel. Also the docs > > state that the hardware isn't actually guaranteed to respect the GFDT > > bit. So it looks like GFDT isn't all that useful on HSW. > > > > So far I've tried this very quickly on an IVB machine, and > > it seems to be working as advertised. No idea if it does any > > good though. > > > > TODO: > > - make sure there are no missing flushes (CPU access doesn't > > respect GFDT after all). > > - measure it and see if there's some real benefit > > - maybe we can track whether "synchronize GFDT" needs to be > > issued, and skip it when possible. needs some numbers to > > determine if it's worthwile. > > > > Signed-off-by: Ville Syrj?l? <ville.syrjala at linux.intel.com> > > Iirc when I've tried this out a while back it regressed a few benchmarks. > Chris&me suspected cache trahsing, but hard to tell without proper > instrumentation support. Chris played around with a few tricks to mark > other giant bos as uncacheable, but he couldn't find any improved > workloads. I see. I didn't realize this was tried already. Not that I really planned to implement this in the first place. I was just studying the code and figured I'd learn better by trying to change things a bit ;) But if there's interest I can of course try to improve it further. > In short I think this needs to come with decent amounts of numbers > attached, like the TODO says ;-) > > The other thing was that I didn't manage to get things to work properly, > leaving some random cachline dirt on the screen. Looking at your code, you > add the gfdt flush to every ring_flush, whereas I've tried to be clever > and only flushed after batches rendering to the frontbuffer. So probably a > bug in my code, or a flush on a given ring doesn't flush out caches for > one of the other engines. I had a bug in my first attempt where I forgot the initial clflush. That left some random crap on the screen until I rendered the next frame w/ GFDT already enabled. Other than that I didn't see any corruption except when I intentionally left out the flushes. But as stated my code does too many flushes probably. -- Ville Syrj?l? Intel OTC