On Wed, Jul 14, 2021 at 02:42:53PM +0300, Ville Syrjälä wrote: > On Wed, Jul 14, 2021 at 01:16:57PM +0200, Daniel Vetter wrote: > > On Tue, Jul 13, 2021 at 09:46:30PM +0300, Ville Syrjälä wrote: > > > On Tue, Jul 13, 2021 at 07:24:23PM +0100, Matthew Auld wrote: > > > > On Tue, 13 Jul 2021 at 18:47, Ville Syrjälä > > > > <ville.syrjala@xxxxxxxxxxxxxxx> wrote: > > > > > > > > > > On Tue, Jul 13, 2021 at 05:13:37PM +0100, Matthew Auld wrote: > > > > > > On Tue, 13 Jul 2021 at 16:55, Ville Syrjälä > > > > > > <ville.syrjala@xxxxxxxxxxxxxxx> wrote: > > > > > > > > > > > > > > On Tue, Jul 13, 2021 at 11:45:50AM +0100, Matthew Auld wrote: > > > > > > > > + /** > > > > > > > > + * @cache_coherent: > > > > > > > > + * > > > > > > > > + * Track whether the pages are coherent with the GPU if reading or > > > > > > > > + * writing through the CPU cache. > > > > > > > > + * > > > > > > > > + * This largely depends on the @cache_level, for example if the object > > > > > > > > + * is marked as I915_CACHE_LLC, then GPU access is coherent for both > > > > > > > > + * reads and writes through the CPU cache. > > > > > > > > + * > > > > > > > > + * Note that on platforms with shared-LLC support(HAS_LLC) reads through > > > > > > > > + * the CPU cache are always coherent, regardless of the @cache_level. On > > > > > > > > + * snooping based platforms this is not the case, unless the full > > > > > > > > + * I915_CACHE_LLC or similar setting is used. > > > > > > > > + * > > > > > > > > + * As a result of this we need to track coherency separately for reads > > > > > > > > + * and writes, in order to avoid superfluous flushing on shared-LLC > > > > > > > > + * platforms, for reads. > > > > > > > > + * > > > > > > > > + * I915_BO_CACHE_COHERENT_FOR_READ: > > > > > > > > + * > > > > > > > > + * When reading through the CPU cache, the GPU is still coherent. Note > > > > > > > > + * that no data has actually been modified here, so it might seem > > > > > > > > + * strange that we care about this. > > > > > > > > + * > > > > > > > > + * As an example, if some object is mapped on the CPU with write-back > > > > > > > > + * caching, and we read some page, then the cache likely now contains > > > > > > > > + * the data from that read. At this point the cache and main memory > > > > > > > > + * match up, so all good. But next the GPU needs to write some data to > > > > > > > > + * that same page. Now if the @cache_level is I915_CACHE_NONE and the > > > > > > > > + * the platform doesn't have the shared-LLC, then the GPU will > > > > > > > > + * effectively skip invalidating the cache(or however that works > > > > > > > > + * internally) when writing the new value. This is really bad since the > > > > > > > > + * GPU has just written some new data to main memory, but the CPU cache > > > > > > > > + * is still valid and now contains stale data. As a result the next time > > > > > > > > + * we do a cached read with the CPU, we are rewarded with stale data. > > > > > > > > + * Likewise if the cache is later flushed, we might be rewarded with > > > > > > > > + * overwriting main memory with stale data. > > > > > > > > + * > > > > > > > > + * I915_BO_CACHE_COHERENT_FOR_WRITE: > > > > > > > > + * > > > > > > > > + * When writing through the CPU cache, the GPU is still coherent. Note > > > > > > > > + * that this also implies I915_BO_CACHE_COHERENT_FOR_READ. > > > > > > > > + * > > > > > > > > + * This is never set when I915_CACHE_NONE is used for @cache_level, > > > > > > > > + * where instead we have to manually flush the caches after writing > > > > > > > > + * through the CPU cache. For other cache levels this should be set and > > > > > > > > + * the object is therefore considered coherent for both reads and writes > > > > > > > > + * through the CPU cache. > > > > > > > > > > > > > > I don't remember why we have this read vs. write split and this new > > > > > > > documentation doesn't seem to really explain it either. > > > > > > > > > > > > Hmm, I attempted to explain that earlier: > > > > > > > > > > > > * Note that on platforms with shared-LLC support(HAS_LLC) reads through > > > > > > * the CPU cache are always coherent, regardless of the @cache_level. On > > > > > > * snooping based platforms this is not the case, unless the full > > > > > > * I915_CACHE_LLC or similar setting is used. > > > > > > * > > > > > > * As a result of this we need to track coherency separately for reads > > > > > > * and writes, in order to avoid superfluous flushing on shared-LLC > > > > > > * platforms, for reads. > > > > > > > > > > > > So AFAIK it's just because shared-LLC can be coherent for reads, while > > > > > > also not being coherent for writes(CACHE_NONE), > > > > > > > > > > CPU vs. GPU is fully coherent when it comes to LLC. Or at least I've > > > > > never heard of any mechanism that would make it only partially coherent. > > > > > > > > What do you mean by "comes to LLC", are you talking about HAS_LLC() or > > > > I915_CACHE_LLC? > > > > > > I'm talking about the actual cache. > > > > > > > > > > > If you set I915_CACHE_LLC, then yes it is fully coherent for both > > > > HAS_LLC() and HAS_SNOOP(). > > > > > > > > If you set I915_CACHE_NONE, then reads are still coherent on > > > > HAS_LLC(), > > > > > > Reads and writes both. The only thing that's not coherent is the > > > display engine. > > > > There's a lot of code which seems to disagree, > > Can't even imagine why anyone would make a cache coherency protocol > that only handles reads but not writes... > > > plus there's now this new > > MOCS thing. > > That's just a full LLC bypass AFAICS. Can't omit invalidates if > you use that one or you'll just get stale data from the cache > on reads as well. > > > I really hope we don't have all those cache coherency bits > > just because the code complexity is entertaining? > > They were definitely added to fix a display issue, and before > that it was just a single flag, which wasn't doing what the display > needed. I think before the flag was added we used some other indicators > to check when we need to clflush, or maybe we did a some extra pointless > clflushes here and there and the broken single flag was supposed to > avoid those. Not quite sure. Hm I thought cache_dirty was added for that display case? But yeah this entire model is maybe not super well-defined in terms of all the use-cases and what exactly it means ... I'm also not clear why this is on the object, and not some kind of function which computes it from the platform + cache_level. > I suppose these two flags should maybe have been named more like > "needs invalidate" and "needs writeback" to make it clear what > one needs to do. tbh I have no idea, this all started to get added after I disappeared for a few years into display. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch