Re: [RFC 2/2] drm/i915: Remove PAT hack from i915_gem_object_can_bypass_llc

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Mon, 17 Jul 2023 11:55:30 +0100

On 15/07/2023 01:20, Matt Roper wrote:
On Fri, Jul 14, 2023 at 11:11:30AM +0100, Tvrtko Ursulin wrote:

On 14/07/2023 06:43, Yang, Fei wrote:
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>

According to the comment in i915_gem_object_can_bypass_llc the
purpose of the function is to return false if the platform/object
has a caching mode where GPU can bypass the LLC.

So far the only platforms which allegedly can do this are Jasperlake
and Elkhartlake, and that via MOCS (not PAT).

Instead of blindly assuming that objects where userspace has set the
PAT index can (bypass the LLC), question is is there a such PAT index
on a platform. Probably starting with Meteorlake since that one is the
only one where set PAT extension can be currently used. Or if there is
a MOCS entry which can achieve the same thing on Meteorlake.

If there is such PAT, now that i915 can be made to understand them
better, we can make the check more fine grained. Or if there is a MOCS
entry then we probably should apply the blanket IS_METEORLAKE condition.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Fixes: 9275277d5324 ("drm/i915: use pat_index instead of cache_level")
Cc: Chris Wilson <chris.p.wilson@xxxxxxxxxxxxxxx>
Cc: Fei Yang <fei.yang@xxxxxxxxx>
Cc: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx>
Cc: Matt Roper <matthew.d.roper@xxxxxxxxx>
---
   drivers/gpu/drm/i915/gem/i915_gem_object.c | 6 ------
   1 file changed, 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
index 33a1e97d18b3..1e34171c4162 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
@@ -229,12 +229,6 @@ bool i915_gem_object_can_bypass_llc(struct drm_i915_gem_object *obj)
        if (!(obj->flags & I915_BO_ALLOC_USER))
                return false;

-     /*
-      * Always flush cache for UMD objects at creation time.
-      */
-     if (obj->pat_set_by_user)

I'm afraid this is going to break MESA. Can we run MESA tests with this patch?

I can't, but question is why it would break Mesa which would need a nice
comment here?

For instance should the check be IS_METEORLAKE?

Or should it be "is wb" && "not has 1-way coherent"?

Or both?

Or, given how Meteorlake does not have LLC, how can anything bypass it
there? Or is it about snooping on Meteorlake and how?

I think the "LLC" in the function name is a bit misleading since this is
really all just about the ability to avoid coherency (which might come
from an LLC on some platforms or from snooping on others).

The concern is that the CPU writes to the buffer and those writes sit in
a CPU cache without making it to RAM immediately.  If the GPU then
reads the object with any of the non-coherent PAT settings that were
introduced in Xe_LPG, it will not snoop the CPU cache and will read old,
stale data from RAM.

So I think we'd want a condition like ("Xe_LPG or later" && "any non
coherent PAT").  The WB/WT/UC status of the GPU behavior shouldn't
matter here, just the coherency setting.

Right, sounds plausible to me. So with this series the new condition in this function would look like this:

i915_gem_object_can_bypass_llc(..)
{
...
	if (i915_gem_object_has_cache_mode(obj, I915_CACHE_MODE_WB) &&
	    i915_gem_object_has_cache_flag(obj, I915_CACHE_FLAG_COH1W) != 1)
		return true;

("!= 1" in the condition meaning either it is not coherent, or i915 does not know due table being incomplete - like some PAT index on some future platform was forgotten to be defined.)

That would catch any platform with non-coherent WB, as long as the PAT-to-i915-cache-mode tables are correct. It would currently only apply to Meteorlake:

#define MTL_CACHE_MODES \
	.cache_modes = { \
		[0] = I915_CACHE(WB), \
		[1] = I915_CACHE(WT), \
		[2] = I915_CACHE(UC), \
		[3] = _I915_CACHE(WB, COH1W), \
		[4] = __I915_CACHE(WB, BIT(I915_CACHE_FLAG_COH1W) | BIT(I915_CACHE_FLAG_COH2W)), \
	}

Or are saying it should apply to UC and WT too somehow?

I'll also try to join sub-threads to Fei's reply here too.

So in terms of the stated issue with _CPU_ access from Mesa seeing stale data (non-zeroed pages) depending on the PAT index - I don't understand that yet. That seems like a completely CPU cache problem space and I do not understand how PAT index gets into the picture.

But the proposed patch from your email Fei looks like it would be covered by the snippet I have in this reply.

Regards,

Tvrtko