On Thu, Apr 28, 2016 at 04:48:37PM +0200, Daniel Vetter wrote: > On Thu, Apr 28, 2016 at 11:38:55AM +0300, Imre Deak wrote: > > On to, 2016-04-28 at 10:17 +0200, Daniel Vetter wrote: > > > On Tue, Apr 26, 2016 at 07:01:06PM +0300, Imre Deak wrote: > > > > On ti, 2016-04-26 at 15:42 +0100, Chris Wilson wrote: > > > > > On Tue, Apr 26, 2016 at 05:26:43PM +0300, Eero Tamminen wrote: > > > > > > Hi, > > > > > > > > > > > > On 26.04.2016 16:23, Chris Wilson wrote: > > > > > > > On Tue, Apr 26, 2016 at 04:17:55PM +0300, Imre Deak wrote: > > > > > > > > On ti, 2016-04-26 at 13:57 +0100, Chris Wilson wrote: > > > > > > > > > On Tue, Apr 26, 2016 at 03:44:22PM +0300, Imre Deak wrote: > > > > > > > > > > Setting a write-back cache policy in the MOCS entry > > > > > > > > > > definition also > > > > > > > > > > implies snooping, which has a considerable overhead. This > > > > > > > > > > is > > > > > > > > > > unexpected for a few reasons: > > > > > > > > > > > > > > > > > > If it is snooping, then I don't see why it is undesirable to > > > > > > > > > have it > > > > > > > > > available in a mocs setting. If it is bogus and the bit is > > > > > > > > > undefined, > > > > > > > > > then by all means remove it. > > > > > > > > > > > > > > > > None of these entries are used alone for coherent surfaces. For > > > > > > > > that > > > > > > > > the application would have to use entry index#1 or #2 _and_ > > > > > > > > call the > > > > > > > > set caching IOCTL to set the corresponding buffer to be cached. > > > > > > > > > > > > > > No, the application doesn't. There are sufficent interfaces > > > > > > > exposed that > > > > > > > userspace can bypass the kernel if it so desired. > > > > > > > > > > > > > > > The > > > > > > > > problem is that without setting the buffer to be cacheable the > > > > > > > > expectation is that we won't be snooping and incur the > > > > > > > > corresponding > > > > > > > > overhead. This is what this patch addresses. > > > > > > > > > > > > > > Not true. > > > > > > > > > > > > > > > The bit is also bogus, if we wanted snooping via MOCS we'd use > > > > > > > > the > > > > > > > > dedicated HW flag for that. > > > > > > > > > > > > > > But you keep saying this bit *enables* snooping. So either it > > > > > > > does or it > > > > > > > doesn't. > > > > > > > > > > > > > > > If we wanted to have a snooping MOCS entry we should add that > > > > > > > > separately (as a forth entry), but we'd need this change as a > > > > > > > > fix for > > > > > > > > current users. > > > > > > > > > > > > > > The current users who are getting what they request but don't > > > > > > > know what > > > > > > > they were requesting? > > > > > > > > > > > > What this kernel ABI (index entry #2) has been agreed & documented > > > > > > to provide? > > > > > > > > > > The ABI is what we agree makes sense between hardware / kernel / > > > > > userspace, and then we stick to it. > > > > > > > > > > > I thought this entry is supposed to replace the writeback LLC/eLLC > > > > > > cache MOCS setting Mesa is using on (e.g. BDW) to speed up accesses > > > > > > to a memory area which it knows always to be accessed so that it > > > > > > can > > > > > > be cached. > > > > > > > > > > Which it does... Only it speeds us writeback from the CPU, not the > > > > > GPU. ;) > > > > > > > > > > The confusion seems to be in mistaking !llc for llc. We have to come > > > > > to > > > > > some agreement on whether it makes sense having multiple entries for > > > > > the > > > > > same follows-PTE mocs or whether it makes more sense to expose the hw > > > > > capabilties. > > > > > > > > Note that we can't just say to Mesa to use index #0 on BXT instead of > > > > index #2, since index #0 turns off caching in GPU L3, so we'd have to > > > > also redefine that to be L3 cached. And I don't know what turning on L3 > > > > caching for index #0 would break, I would rather avoid doing that. Imo > > > > defining the entries the following way would allow us to use the same > > > > indexes on GEN9 regardless of it being SKL or BXT: > > > > #0: uncached (neither in L3 nor in (e)LLC) > > > > #1: PTE passthrough > > > > > > So our rendercpy in igt does set MOCS entry 1. Or how exactly do all the > > > set_caching vs. rendercpy tests we currently have pass? Just not? > > > > We don't have tests for coherent surfaces. The current rendercpy test > > just uses uncached buffers, so they are flushed before checking the > > result. I could add a new subtest to rendercpy to test with a > > cached/coherent surface. > > > > > Also, you're guaranateeing that opencl/libva don't screw this up either? > > > > If they don't set the given buffer to be cached via the set_caching > > IOCTL (as a consequence making them coherent) they are already screwed > > on CHV. If they call the IOCTL they are fine on BXT too. > > We do implicit set_caching when displaying something to something > coherent. To make that work userspace should use the "use PTE" mode by > default, except when they really know what they're doing. That's also the > mode that's supposed to give you the most reasonable performance. But > somehow that mode ended up in MOCS entry 1, so pretty much guaranteed > userspace will get it wrong. Mesa just hit a perf snag, but might as well > have been visual corruption. I think it'd be a lot safe to make "use PTE" > entry 0. If no one so far uses MOCS entry 0, we could redefine it safely. Mesa doesn't use it (at least on purpose, it might by accident though). I have no idea about anyone else. -- Ville Syrjälä Intel OTC _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx