> On Sun, May 07, 2023 at 11:39:18PM -0700, Yang, Fei wrote:
>>> On Wed, May 03, 2023 at 03:50:59PM -0700, fei.yang@xxxxxxxxx wrote:
>>>> From: Fei Yang <fei.yang@xxxxxxxxx>
>>>>
>>>> Currently the KMD is using enum i915_cache_level to set caching policy for
>>>> buffer objects. This is flaky because the PAT index which really controls
>>>> the caching behavior in PTE has far more levels than what's defined in the
>>>> enum. In addition, the PAT index is platform dependent, having to translate
>>>> between i915_cache_level and PAT index is not reliable, and makes the code
>>>> more complicated.
>>>>
>>>> From UMD's perspective there is also a necessity to set caching policy for
>>>> performance fine tuning. It's much easier for the UMD to directly use PAT
>>>> index because the behavior of each PAT index is clearly defined in Bspec.
>>>> Having the abstracted i915_cache_level sitting in between would only cause
>>>> more ambiguity.
>>>
>>> It may be worth mentioning here that PAT is expected to work much like
>>> MOCS already works today --- the contract on the exact platform-specific
>>> meaning of each index is documented in the hardware spec and userspace
>>> is expected to select the index that exactly matches the behavior it
>>> desires.
>> will update.
Please review https://patchwork.freedesktop.org/series/117480/
>>>>
>>>> For these reasons this patch replaces i915_cache_level with PAT index. Also
>>>> note, the cache_level is not completely removed yet, because the KMD still
>>>> has the need of creating buffer objects with simple cache settings such as
>>>> cached, uncached, or writethrough. For such simple cases, using cache_level
>>>> would help simplify the code.
>>>
>>> See my comment farther down, but the implementation of
>>> i915_gem_object_has_cache_level() seems a bit confusing and you may want
>>> to elaborate on it here.
>>>
>>> Also somewhat confusing from a high-level skim is if/how we're still
>>> using obj->cache_coherent once userspace has taken direct control of the
>>> cache behavior. Some PAT indices give coherency and others don't (and
>>> the combinations will likely get more complicated on future platforms).
>>> If obj->cache_coherent is still being considered even once PAT indices
>>> are being controlled by userspace, I think we need some explanation of
>>> how that works in the commit message (and likely in the kerneldoc for
>>> that field too).
>> For the objects with pat_index set by user space, coherency is managed by
>> user space. Things like obj->cache_coherent and obj->cache_dirty are still
>> there for objects created by kernel.
>
> Right, that's the challenge --- userspace is taking over control of this
> stuff, but the fields are still around and still used internally within
> the driver. How we reconcile those two things needs to be clearly
> explained in the commit message and kerneldoc, otherwise we're going to
> wind up with confusing code that's very difficult to maintain down the
> road.
>
> All the cache behavior is complicated enough that it probably wouldn't
> be a bad idea to have a dedicated section of the kerneldoc focused on
> cache behavior.
Updated with kerneldoc and comments.
As discussed off-line, I have also squashed the pte_encode patch.
>>>>
>>>> Cc: Chris Wilson <chris.p.wilson@xxxxxxxxxxxxxxx>
>>>> Cc: Matt Roper <matthew.d.roper@xxxxxxxxx>
>>>> Signed-off-by: Fei Yang <fei.yang@xxxxxxxxx>
>>>> Reviewed-by: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx>
>>>> ---
>>>> drivers/gpu/drm/i915/display/intel_dpt.c | 12 +--
>>>> drivers/gpu/drm/i915/gem/i915_gem_domain.c | 45 ++++++----
>>>> .../gpu/drm/i915/gem/i915_gem_execbuffer.c | 10 ++-
>>>> drivers/gpu/drm/i915/gem/i915_gem_mman.c | 3 +-
>>>> drivers/gpu/drm/i915/gem/i915_gem_object.c | 51 +++++++++++-
>>>> drivers/gpu/drm/i915/gem/i915_gem_object.h | 4 +
>>>> .../gpu/drm/i915/gem/i915_gem_object_types.h | 25 +++++-
>>>> drivers/gpu/drm/i915/gem/i915_gem_stolen.c | 4 +-
>>>> drivers/gpu/drm/i915/gem/i915_gem_ttm_move.c | 16 ++--
>>>> .../gpu/drm/i915/gem/selftests/huge_pages.c | 2 +-
>>>> .../drm/i915/gem/selftests/i915_gem_migrate.c | 2 +-
>>>> .../drm/i915/gem/selftests/i915_gem_mman.c | 2 +-
>>>> drivers/gpu/drm/i915/gt/gen6_ppgtt.c | 10 ++-
>>>> drivers/gpu/drm/i915/gt/gen8_ppgtt.c | 71 ++++++++--------
>>>> drivers/gpu/drm/i915/gt/gen8_ppgtt.h | 3 +-
>>>> drivers/gpu/drm/i915/gt/intel_ggtt.c | 82 +++++++++----------
>>>> drivers/gpu/drm/i915/gt/intel_gtt.h | 20 ++---
>>>> drivers/gpu/drm/i915/gt/intel_migrate.c | 47 ++++++-----
>>>> drivers/gpu/drm/i915/gt/intel_migrate.h | 13 ++-
>>>> drivers/gpu/drm/i915/gt/intel_ppgtt.c | 6 +-
>>>> drivers/gpu/drm/i915/gt/selftest_migrate.c | 47 ++++++-----
>>>> drivers/gpu/drm/i915/gt/selftest_reset.c | 8 +-
>>>> drivers/gpu/drm/i915/gt/selftest_timeline.c | 2 +-
>>>> drivers/gpu/drm/i915/gt/selftest_tlb.c | 4 +-
>>>> drivers/gpu/drm/i915/gt/uc/intel_uc_fw.c | 10 ++-
>>>> drivers/gpu/drm/i915/i915_debugfs.c | 52 +++++++++---
>>>> drivers/gpu/drm/i915/i915_gem.c | 16 +++-
>>>> drivers/gpu/drm/i915/i915_gpu_error.c | 8 +-
>>>> drivers/gpu/drm/i915/i915_vma.c | 16 ++--
>>>> drivers/gpu/drm/i915/i915_vma.h | 2 +-
>>>> drivers/gpu/drm/i915/i915_vma_types.h | 2 -
>>>> drivers/gpu/drm/i915/selftests/i915_gem.c | 5 +-
>>>> .../gpu/drm/i915/selftests/i915_gem_evict.c | 4 +-
>>>> drivers/gpu/drm/i915/selftests/i915_gem_gtt.c | 15 ++--
>>>> .../drm/i915/selftests/intel_memory_region.c | 4 +-
>>>> drivers/gpu/drm/i915/selftests/mock_gtt.c | 8 +-
>>>> 36 files changed, 391 insertions(+), 240 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/display/intel_dpt.c b/drivers/gpu/drm/i915/display/intel_dpt.c
>>>> index c5eacfdba1a5..7c5fddb203ba 100644
>>>> --- a/drivers/gpu/drm/i915/display/intel_dpt.c
>>>> +++ b/drivers/gpu/drm/i915/display/intel_dpt.c
>>>> @@ -43,24 +43,24 @@ static void gen8_set_pte(void __iomem *addr, gen8_pte_t pte)
>>>> static void dpt_insert_page(struct i915_address_space *vm,
>>>> dma_addr_t addr,
>>>> u64 offset,
>>>> - enum i915_cache_level level,
>>>> + unsigned int pat_index,
>>>> u32 flags)
>>>> {
>>>> struct i915_dpt *dpt = i915_vm_to_dpt(vm);
>>>> gen8_pte_t __iomem *base = dpt->iomem;
>>>>
>>>> gen8_set_pte(base + offset / I915_GTT_PAGE_SIZE,
>>>> - vm->pte_encode(addr, level, flags));
>>>> + vm->pte_encode(addr, pat_index, flags));
>>>> }
>>>>
>>>> static void dpt_insert_entries(struct i915_address_space *vm,
>>>> struct i915_vma_resource *vma_res,
>>>> - enum i915_cache_level level,
>>>> + unsigned int pat_index,
>>>> u32 flags)
>>>> {
>>>> struct i915_dpt *dpt = i915_vm_to_dpt(vm);
>>>> gen8_pte_t __iomem *base = dpt->iomem;
>>>> - const gen8_pte_t pte_encode = vm->pte_encode(0, level, flags);
>>>> + const gen8_pte_t pte_encode = vm->pte_encode(0, pat_index, flags);
>>>> struct sgt_iter sgt_iter;
>>>> dma_addr_t addr;
>>>> int i;
>>>> @@ -83,7 +83,7 @@ static void dpt_clear_range(struct i915_address_space *vm,
>>>> static void dpt_bind_vma(struct i915_address_space *vm,
>>>> struct i915_vm_pt_stash *stash,
>>>> struct i915_vma_resource *vma_res,
>>>> - enum i915_cache_level cache_level,
>>>> + unsigned int pat_index,
>>>> u32 flags)
>>>> {
>>>> u32 pte_flags;
>>>> @@ -98,7 +98,7 @@ static void dpt_bind_vma(struct i915_address_space *vm,
>>>> if (vma_res->bi.lmem)
>>>> pte_flags |= PTE_LM;
>>>>
>>>> - vm->insert_entries(vm, vma_res, cache_level, pte_flags);
>>>> + vm->insert_entries(vm, vma_res, pat_index, pte_flags);
>>>>
>>>> vma_res->page_sizes_gtt = I915_GTT_PAGE_SIZE;
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_domain.c b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>>> index d2d5a24301b2..ae99b4be5918 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_domain.c
>>>> @@ -27,8 +27,8 @@ static bool gpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>>>> if (IS_DGFX(i915))
>>>> return false;
>>>>
>>>> - return !(obj->cache_level == I915_CACHE_NONE ||
>>>> - obj->cache_level == I915_CACHE_WT);
>>>> + return !(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>>>> + i915_gem_object_has_cache_level(obj, I915_CACHE_WT));
>>>> }
>>>>
>>>> bool i915_gem_cpu_write_needs_clflush(struct drm_i915_gem_object *obj)
>>>> @@ -267,7 +267,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>>>> {
>>>> int ret;
>>>>
>>>> - if (obj->cache_level == cache_level)
>>>> + if (i915_gem_object_has_cache_level(obj, cache_level))
>>>> return 0;
>>>>
>>>> ret = i915_gem_object_wait(obj,
>>>> @@ -278,10 +278,8 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>>>> return ret;
>>>>
>>>> /* Always invalidate stale cachelines */
>>>> - if (obj->cache_level != cache_level) {
>>>> - i915_gem_object_set_cache_coherency(obj, cache_level);
>>>> - obj->cache_dirty = true;
>>>> - }
>>>> + i915_gem_object_set_cache_coherency(obj, cache_level);
>>>> + obj->cache_dirty = true;
>>>>
>>>> /* The cache-level will be applied when each vma is rebound. */
>>>> return i915_gem_object_unbind(obj,
>>>> @@ -306,20 +304,22 @@ int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
>>>> goto out;
>>>> }
>>>>
>>>> - switch (obj->cache_level) {
>>>> - case I915_CACHE_LLC:
>>>> - case I915_CACHE_L3_LLC:
>>>> - args->caching = I915_CACHING_CACHED;
>>>> - break;
>>>> + /*
>>>> + * This ioctl should be disabled for the objects with pat_index
>>>> + * set by user space.
>>>> + */
>>>> + if (obj->pat_set_by_user) {
>>>> + err = -EOPNOTSUPP;
>>>> + goto out;
>>>> + }
>>>>
>>>> - case I915_CACHE_WT:
>>>> + if (i915_gem_object_has_cache_level(obj, I915_CACHE_LLC) ||
>>>> + i915_gem_object_has_cache_level(obj, I915_CACHE_L3_LLC))
>>>> + args->caching = I915_CACHING_CACHED;
>>>> + else if (i915_gem_object_has_cache_level(obj, I915_CACHE_WT))
>>>> args->caching = I915_CACHING_DISPLAY;
>>>> - break;
>>>> -
>>>> - default:
>>>> + else
>>>> args->caching = I915_CACHING_NONE;
>>>> - break;
>>>> - }
>>>> out:
>>>> rcu_read_unlock();
>>>> return err;
>>>> @@ -364,6 +364,15 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
>>>> if (!obj)
>>>> return -ENOENT;
>>>>
>>>> + /*
>>>> + * This ioctl should be disabled for the objects with pat_index
>>>> + * set by user space.
>>>> + */
>>>> + if (obj->pat_set_by_user) {
>>>> + ret = -EOPNOTSUPP;
>>>> + goto out;
>>>> + }
>>>> +
>>>> /*
>>>> * The caching mode of proxy object is handled by its generator, and
>>>> * not allowed to be changed by userspace.
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>> index 3aeede6aee4d..d42915516636 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
>>>> @@ -642,7 +642,7 @@ static inline int use_cpu_reloc(const struct reloc_cache *cache,
>>>>
>>>> return (cache->has_llc ||
>>>> obj->cache_dirty ||
>>>> - obj->cache_level != I915_CACHE_NONE);
>>>> + !i915_gem_object_has_cache_level(obj, I915_CACHE_NONE));
>>>> }
>>>>
>>>> static int eb_reserve_vma(struct i915_execbuffer *eb,
>>>> @@ -1323,8 +1323,10 @@ static void *reloc_iomap(struct i915_vma *batch,
>>>> offset = cache->node.start;
>>>> if (drm_mm_node_allocated(&cache->node)) {
>>>> ggtt->vm.insert_page(&ggtt->vm,
>>>> - i915_gem_object_get_dma_address(obj, page),
>>>> - offset, I915_CACHE_NONE, 0);
>>>> + i915_gem_object_get_dma_address(obj, page),
>>>> + offset,
>>>> + i915_gem_get_pat_index(ggtt->vm.i915, I915_CACHE_NONE),
>>>> + 0);
>>>> } else {
>>>> offset += page << PAGE_SHIFT;
>>>> }
>>>> @@ -1464,7 +1466,7 @@ eb_relocate_entry(struct i915_execbuffer *eb,
>>>> reloc_cache_unmap(&eb->reloc_cache);
>>>> mutex_lock(&vma->vm->mutex);
>>>> err = i915_vma_bind(target->vma,
>>>> - target->vma->obj->cache_level,
>>>> + target->vma->obj->pat_index,
>>>> PIN_GLOBAL, NULL, NULL);
>>>> mutex_unlock(&vma->vm->mutex);
>>>> reloc_cache_remap(&eb->reloc_cache, ev->vma->obj);
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_mman.c b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>>> index 3dbacdf0911a..50c30efa08a3 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_mman.c
>>>> @@ -383,7 +383,8 @@ static vm_fault_t vm_fault_gtt(struct vm_fault *vmf)
>>>> }
>>>>
>>>> /* Access to snoopable pages through the GTT is incoherent. */
>>>> - if (obj->cache_level != I915_CACHE_NONE && !HAS_LLC(i915)) {
>>>> + if (!(i915_gem_object_has_cache_level(obj, I915_CACHE_NONE) ||
>>>> + HAS_LLC(i915))) {
>>>> ret = -EFAULT;
>>>> goto err_unpin;
>>>> }
>>>> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.c b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>>> index 8c70a0ec7d2f..46a19b099ec8 100644
>>>> --- a/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>>> +++ b/drivers/gpu/drm/i915/gem/i915_gem_object.c
>>>> @@ -54,6 +54,24 @@ unsigned int i915_gem_get_pat_index(struct drm_i915_private *i915,
>>>> return INTEL_INFO(i915)->cachelevel_to_pat[level];
>>>> }
>>>>
>>>> +bool i915_gem_object_has_cache_level(const struct drm_i915_gem_object *obj,
>>>> + enum i915_cache_level lvl)
>>>> +{
>>>> + /*
>>>> + * In case the pat_index is set by user space, this kernel mode
>>>> + * driver should leave the coherency to be managed by user space,
>>>> + * simply return true here.
>>>> + */
>>>> + if (obj->pat_set_by_user)
>>>> + return true;
>>>
>>> This function gets called in a few different places; if the question is
>>> "does the object have specific cache behavior XX" then universally
>>> returning "yes" if the user has taken direct control of PAT seems
>>> confusing, especially since we're not looking at what cache level was
>>> being queried nor what PAT index was selected by userspace. A return
>>> value of 'true' here could be correct in some situations but not others,
>>> depending on how the caller intends to use that information. I assume
>>> you've gone through all the places this function gets called and ensured
>>> that a return value of true results in the expected behavior; you may
>>> want to elaborate on that in the commit message.
>> There is a patch later in the series disabling {set|get}_caching ioctl for
>> objects with pat_index set by userspace. With that the logic here would
>> make sure kernel won't touch the cache settings for these objects. There
>> is one caller vm_fault_gtt() might be a bit sensitive though, CI doesn't
>> report any problem here, but if further changes in this part of the code
>> were made, the logic here might need to be refined.
>
> From a quick skim, it looks like there are callsites at:
> - gpu_write_needs_clflush()
> - i915_gem_object_set_cache_level()
> - i915_gem_get_caching_ioctl()
> - use_cpu_reloc()
> - vm_fault_gtt()
>
> For some of them like the ioctl function, we'll presumably just never
> get to the point in the function where it's called if userspace has
> taken control of PAT indices, but that's the kind of thing that needs to
> be clearly explained. These changes are a large change to how the
> driver works and people are going to be looking to the commit messages,
> existing kerneldoc, etc. to figure out how things should work down the
> road. We can't just explain stuff in a mailing list thread and rely on
> people remembering the details months or years from now. There may be
> new places in the driver where i915_gem_object_has_cache_level() gets
> used in the future and the non-obvious "always returns true" behavior
> needs to be clearly documented to make sure that the new callers are
> able to handle that properly.
Added comments for gpu_write_needs_clflush(), use_cpu_reloc() and
vm_fault_gtt(). Others have comments explaining when to skip already.
> So I'm not saying any of the code is wrong today, I'm just saying that
> we need to make sure it's properly documented so that it's maintainable
> in the future.
>
>
> Matt
|