>>> [snip] >>>>> @@ -27,15 +28,8 @@ static bool gpu_write_needs_clflush(struct >>>>> drm_i915_gem_object *obj) >>>> >>>> The code change here looks accurate, but while we're here, I have a >>>> side question about this function in general...it was originally >>>> introduced in commit 48004881f693 ("drm/i915: Mark CPU cache as >>>> dirty when used for >>>> rendering") which states that GPU rendering ends up in the CPU cache >>>> (and thus needs a clflush later to make sure it lands in memory). >>>> That makes sense to me for LLC platforms, but is it really true for >>>> non-LLC snooping platforms (like MTL) as the commit states? >>> >>> For non-LLC platforms objects can be set to 1-way coherent which >>> means GPU rendering ending up in CPU cache as well, so for non-LLC >>> platform the logic here should be checking 1-way coherent flag. >> >> That's the part that I'm questioning (and not just for MTL, but for >> all of our other non-LLC platforms too). Just because there's >> coherency doesn't mean that device writes landed in the CPU cache. >> Coherency is also achieved if device writes invalidate the contents of the CPU cache. >> I thought our non-LLC snooping platforms were coherent due to >> write-invalidate rather than write-update, but I can't find it >> specifically documented anywhere at the moment. If write-invalidate >> was used, then there shouldn't be a need for a later clflush either. > > [Trying to consolidate by doing a combined reply to the discussion so far.] > > On the write-invalidate vs write-update I don't know. If you did not > find it in bspec then I doubt I would. I can have a browse still. Matt was correct. Quote Ron Silvas from SW ARCH, "MTL GPU doesn't write to CPU cache, it simply snoop CPU cache on its way to RAM." >>>> My understanding >>>> was that snooping platforms just invalidated the CPU cache to >>>> prevent future CPU reads from seeing stale data but didn't actually >>>> stick any new data in there? Am I off track or is the original >>>> logic of this function not quite right? >>>> >>>> Anyway, even if the logic of this function is wrong, it's a mistake >>>> that would only hurt performance >>> >>> Yes, this logic will introduce performance impact because it's >>> missing the checking for obj->pat_set_by_user. For objects with >>> pat_set_by_user==true, even if the object is snooping or 1-way >>> coherent, we don't want to enforce a clflush here since the >>> coherency is supposed to be handled by user space. > > What should I add you think to fix it? I think the simplest would be if (obj->pat_set_by_user) return false; because even checking for incoherent WB is unnecessary, simply no need for the KMD to initiate a flush if PAT is set by user. > Add a check for non-coherent WB in gpu_write_needs_clflush as an additional condition for returning false? > > And then if Matt is correct write-invalidate is used also !HAS_LLC should just return false? > >>>> (flushing more often than we truly need to) rather than >>>> functionality, so not something we really need to dig into right now >>>> as part of this patch. >>>> >>>>> if (IS_DGFX(i915)) >>>>> return false; >>>>> >>>>> - /* >>>>> - * For objects created by userspace through GEM_CREATE with pat_index >>>>> - * set by set_pat extension, i915_gem_object_has_cache_level() will >>>>> - * always return true, because the coherency of such object is managed >>>>> - * by userspace. Othereise the call here would fall back to checking >>>>> - * whether the object is un-cached or write-through. >>>>> - */ >>>>> - return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) || >>>>> - i915_gem_object_has_cache_level(obj, I915_CACHE_WT)); >>>>> + return i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_UC) != 1 && >>>>> + i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WT) != 1; >>>>> } >>> >>> [snip] >>>>> @@ -640,15 +640,9 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache, >>>>> if (DBG_FORCE_RELOC == FORCE_GTT_RELOC) >>>>> return false; >>>>> >>>>> - /* >>>>> - * For objects created by userspace through GEM_CREATE with pat_index >>>>> - * set by set_pat extension, i915_gem_object_has_cache_level() always >>>>> - * return true, otherwise the call would fall back to checking whether >>>>> - * the object is un-cached. >>>>> - */ >>>>> return (cache->has_llc || >>>>> obj->cache_dirty || >>>>> - !i915_gem_object_has_cache_level(obj, I915_CACHE_NONE)); >>>>> + i915_gem_object_has_cache_mode(obj, >>>>> + I915_CACHE_MODE_UC) != 1); >>>> >>>> Platforms with relocations and platforms with user-specified PAT >>>> have no overlap, right? So a -1 return should be impossible here >>>> and this is one case where we could just treat the return value as >>>> a boolean, right? >>> > > Hm no, or maybe. My thinking behind tri-state is to allow a safe option > for "don't know". In case PAT index to cache mode table is not fully > populated on some future platform. That would be a problem in the cache mode table. At least max_pat_index should have guaranteed the PAT index is sane. -Fei