Re: [PATCH v2 2/2] drm/i915/bxt: Fix inadvertent CPU snooping due to incorrect MOCS config

Imre Deak <imre.deak@xxxxxxxxx> · Tue, 26 Apr 2016 19:01:06 +0300

On ti, 2016-04-26 at 15:42 +0100, Chris Wilson wrote:
> On Tue, Apr 26, 2016 at 05:26:43PM +0300, Eero Tamminen wrote:
> > Hi,
> > 
> > On 26.04.2016 16:23, Chris Wilson wrote:
> > > On Tue, Apr 26, 2016 at 04:17:55PM +0300, Imre Deak wrote:
> > > > On ti, 2016-04-26 at 13:57 +0100, Chris Wilson wrote:
> > > > > On Tue, Apr 26, 2016 at 03:44:22PM +0300, Imre Deak wrote:
> > > > > > Setting a write-back cache policy in the MOCS entry
> > > > > > definition also
> > > > > > implies snooping, which has a considerable overhead. This
> > > > > > is
> > > > > > unexpected for a few reasons:
> > > > > 
> > > > > If it is snooping, then I don't see why it is undesirable to
> > > > > have it
> > > > > available in a mocs setting. If it is bogus and the bit is
> > > > > undefined,
> > > > > then by all means remove it.
> > > > 
> > > > None of these entries are used alone for coherent surfaces. For
> > > > that
> > > > the application would have to use entry index#1 or #2 _and_
> > > > call the
> > > > set caching IOCTL to set the corresponding buffer to be cached.
> > > 
> > > No, the application doesn't. There are sufficent interfaces
> > > exposed that
> > > userspace can bypass the kernel if it so desired.
> > > 
> > > > The
> > > > problem is that without setting the buffer to be cacheable the
> > > > expectation is that we won't be snooping and incur the
> > > > corresponding
> > > > overhead. This is what this patch addresses.
> > > 
> > > Not true.
> > > 
> > > > The bit is also bogus, if we wanted snooping via MOCS we'd use
> > > > the
> > > > dedicated HW flag for that.
> > > 
> > > But you keep saying this bit *enables* snooping. So either it
> > > does or it
> > > doesn't.
> > > 
> > > > If we wanted to have a snooping MOCS entry we should add that
> > > > separately (as a forth entry), but we'd need this change as a
> > > > fix for
> > > > current users.
> > > 
> > > The current users who are getting what they request but don't
> > > know what
> > > they were requesting?
> > 
> > What this kernel ABI (index entry #2) has been agreed & documented
> > to provide?
> 
> The ABI is what we agree makes sense between hardware / kernel /
> userspace, and then we stick to it.
> 
> > I thought this entry is supposed to replace the writeback LLC/eLLC
> > cache MOCS setting Mesa is using on (e.g. BDW) to speed up accesses
> > to a memory area which it knows always to be accessed so that it
> > can
> > be cached.
> 
> Which it does... Only it speeds us writeback from the CPU, not the
> GPU. ;)
> 
> The confusion seems to be in mistaking !llc for llc. We have to come
> to
> some agreement on whether it makes sense having multiple entries for
> the
> same follows-PTE mocs or whether it makes more sense to expose the hw
> capabilties.

Note that we can't just say to Mesa to use index #0 on BXT instead of
index #2, since index #0 turns off caching in GPU L3, so we'd have to
also redefine that to be L3 cached. And I don't know what turning on L3
caching for index #0 would break, I would rather avoid doing that. Imo
defining the entries the following way would allow us to use the same
indexes on GEN9 regardless of it being SKL or BXT:
#0: uncached (neither in L3 nor in (e)LLC)
#1: PTE passthrough
#2: cached everywhere (L3 + (e)LLC if it exists), non-coherent
#3: cached everywhere, coherent

I'm not sure if there is even a user for coherent surfaces atm, so then
we could delay adding #3 until it's really needed.

--Imre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx