On Mon, May 02, 2016 at 10:28:50AM +0200, Daniel Vetter wrote: > On Thu, Apr 28, 2016 at 08:15:24PM +0300, Imre Deak wrote: > > On to, 2016-04-28 at 16:48 +0200, Daniel Vetter wrote: > > > On Thu, Apr 28, 2016 at 11:38:55AM +0300, Imre Deak wrote: > > > > On to, 2016-04-28 at 10:17 +0200, Daniel Vetter wrote: > > > > > Also, you're guaranateeing that opencl/libva don't screw this up > > > > > either? > > > > > > > > If they don't set the given buffer to be cached via the set_caching > > > > IOCTL (as a consequence making them coherent) they are already > > > > screwed > > > > on CHV. If they call the IOCTL they are fine on BXT too. > > > > > > We do implicit set_caching when displaying something to something > > > coherent. To make that work userspace should use the "use PTE" mode by > > > default, except when they really know what they're doing. > > > That's also the mode that's supposed to give you the most reasonable > > > performance. But somehow that mode ended up in MOCS entry 1, so pretty much > > > guaranteed userspace will get it wrong. Mesa just hit a perf snag, but > > > might as well have been visual corruption. I think it'd be a lot safe to > > > make "use PTE" entry 0. > > > > Mesa uses entries 1 and 2. If something else like opencl or libva (or > > even Mesa for that matter) uses index 0 for PTE pass-through that's a > > bug on its own. I don't know if this is the case, but it's a separate > > issue from what I'm trying to fix here. > > > > This isn't about a case where a PTE pass-through entry needs to be > > provided, but about the case where a cached but non-coherent one is > > needed. Mesa assumes this to be entry 2 and I don't see why we couldn't > > make sure that this assumption holds. Note that this entry on BXT could > > be both a PTE pass-through one as in this patch or LLC-UC. > > Yeah my comment about entry #0 was is a different track of discussion. > Should still fix it up while we clarify what entries 1&2 really mean. > > When defining entries as "cached" please make triple sure what exactly you > mean by that. Since eLLC, LLC and on-gpu L3$ are all different caches, in > different parts of the coherency fabric. And L3$ has functional relevance > since if that's not set compute features fall apart. > > So maybe a better definition would be "L3$ cached (useful for > compute)+performance optimized otherwise for general use+might be > incoherent depending upon platform" for entry 2. That would make sense and > covers it all, but imo yours a bit too simple (assuming my understanding > of cache architecture on gen9 is accurate, they change it all the bloody > time). Note that e.g. on older platforms we could enable L3$ either in the > PTE, or in MOCS settings in the batch itself, which is why the "useful for > compute" only started to become relevant for gen9. And why we needed the > kernel MOCS patch really. L3 control via PTE died with IVB. Since then it's been MOCS only. -- Ville Syrjälä Intel OTC _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx