On 23/04/2023 07:52, Yang, Fei wrote:
On 20/04/2023 00:00, fei.yang@xxxxxxxxx wrote:
From: Fei Yang <fei.yang@xxxxxxxxx>
Currently the KMD is using enum i915_cache_level to set caching policy for
buffer objects. This is flaky because the PAT index which really controls
the caching behavior in PTE has far more levels than what's defined in the
enum. In addition, the PAT index is platform dependent, having to translate
between i915_cache_level and PAT index is not reliable, and makes the code
more complicated.
>From UMD's perspective there is also a necessity to set caching policy for
performance fine tuning. It's much easier for the UMD to directly use PAT
index because the behavior of each PAT index is clearly defined in Bspec.
Having the abstracted i915_cache_level sitting in between would only cause
more ambiguity.
For these reasons this patch replaces i915_cache_level with PAT index. Also
note, the cache_level is not completely removed yet, because the KMD still
has the need of creating buffer objects with simple cache settings such as
cached, uncached, or writethrough. For such simple cases, using cache_level
would help simplify the code.
Cc: Chris Wilson <chris.p.wilson@xxxxxxxxxxxxxxx>
Cc: Matt Roper <matthew.d.roper@xxxxxxxxx>
Signed-off-by: Fei Yang <fei.yang@xxxxxxxxx>
Reviewed-by: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx>
I think have some ideas no how to perhaps make this simpler, please bear
with me.
In my mind get/set caching ioctls need to be failing once explicit pat
index has been set by userspace. Or at least not return false information.
By design we are ending the support for set caching ioctl. The patch is included
in this series, "drm/i915/mtl: end support for set caching ioctl"
+ if (GRAPHICS_VER_FULL(i915) >= IP_VER(12, 70))
+ return -EOPNOTSUPP;
+
And I don't like i915_gem_object_has_cache_level and
i915_gem_get_pat_index as a refactoring step.
It also seems that the driver has a need to query the caching mode set
regardless of the route (of setting).
Only for the objects created by the KMD. For UMD created objects with PAT
index set KMD should never touch the setting.
So how about this.
Three callers which query the caching mode: use_cpu_reloc, vm_fault_gtt,
gpu_write_needs_clflush.
We convert them to be like:
i915_gem_object_has_caching_mode(obj, PAT_UC / PAT_WT / ...);
PAT_UC/WT/WB are platform dependent (https://gfxspecs.intel.com/Predator/Home/Index/45101),
performing this check you would have to do something like,
if (MTL)
...
else if (PVC)
...
else if (GEN12)
...
else
...
No, it would be doable with a table as I suggested below. And that table
could be re-used for debugfs pretty-printing simplifying that code too.
Then apart from the per platform tables for mapping between cache level
to pat index, you add tables which map pat index to caching modes
(PAT_UC, etc, naming TBD, just enums or bitmasks also TBD, I haven't
looked at the bspec to see how exactly it works).
You would use that table in the i915_gem_object_has_caching_mode helper,
called from the above three functions instead of obj->cache_level direct
comparison.
I am assuming at least for instance cache_level != I915_CACHE_NONE would
be equivalent to i915_gem_object_has_caching_mode(obj, PAT_UC), etc.
So far kernel only needs 4 cache levels defined in enum i915_cache_level,
kernel doesn't need to understand all PAT indices. By desgin if the userspace
is setting PAT index directly, kernel only needs to pass the setting to PTE.
For objects created by kernel (including objects created by userspace without
specifying pat index), there are only 4 options (defined in the cachelevel_to_pat).
For objects created by userspace with PAT index set (GEM_CREATE + set_pat extension),
kernel should not touch the setting, just pass it to the PAT index bits in PTE.
That's why I was only checking cache_level. Handling PAT index is much more
complicated because of its platform dependent nature and even the number of
PAT indices varies from platform to platform. Fortunately kernel doesn't need
to understand that.
Yeah but I think you maybe missed the spirit of my proposal - which is
to simplify the internal code paths by not having the duality of
cache_level-vs-pat almost all the way down. But instead cut it at the
top API level.
You have this:
+ .cachelevel_to_pat = { \
+ [I915_CACHE_NONE] = 0, \
+ [I915_CACHE_LLC] = 1, \
+ [I915_CACHE_L3_LLC] = 2, \
+ [I915_CACHE_WT] = 3, \
+ }
I propose to add something like:
.legacy_platform_pat = { /* pat index to driver logical flags */
[0] = PAT_UC,
[1] = PAT_WB | PAT_LLC, /* Just illustrating the principle */
};
i915->platform_pat = &legacy_platform_pat
i915_gem_object_has_caching_mode(obj, PAT_UC)
...
return i915->platform_pat & PAT_UC == PAT_UC /* give or take */
get/set_caching_ioctl
{
...
if (obj->has_pat_index) /* set in the extension only */
return -EINVAL;
debugfs:
i915_show_pat_flags(i915->platform_pat[obj->pat_index]); /* generic! */
etc...
Set obj->pat_index in the extension or set_cache_level _only_. No
internal code paths then need to use anything but it. No sprinkling of
conversion helpers needed or dubious has_cache_level query.
Regards,
Tvrtko
Same mapping table could also be used in debugfs (i915_cache_level_str)
to universally describe any obj->pat_index, with no need to have
anything platform dependend there.
In set caching set you always set obj->pat_index and so low level code
can always just use that.
Unless I am missing something (possible) I think like that we end up
with no i915_gem_get_pat_index sprinkled around and also no confusing
i915_gem_object_has_cache_level.
Obj->pat_index would be a single point of truth, while obj->cache_level
is just a legacy field for get/set_caching ioctl - not used in the
internal driver flows.
We would need an additional field for storing the boolean of whether
userspace had overriden the PAT.
Regards,
Tvrtko