Re: [PATCH 06/13] drm/amdgpu: use the new drm_exec object for CS v2

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 20.06.23 um 06:07 schrieb Tatsuyuki Ishi:
+Boris and +Matthew in case you want to take over this patch set.

Here are some review / testing comments, including those I posted before to ease tracking.

On 5/4/23 20:51, Christian König wrote:
Use the new component here as well and remove the old handling.

v2: drop dupplicate handling

Signed-off-by: Christian König <christian.koenig@xxxxxxx>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu.h         |   1 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.c |  71 ++-----
  drivers/gpu/drm/amd/amdgpu/amdgpu_bo_list.h |   5 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c      | 210 +++++++++-----------
  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.h      |   7 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c      |  22 --
  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h      |   3 -
  7 files changed, 115 insertions(+), 204 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index 02b827785e39..eba3e4f01ea6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -133,6 +141,8 @@ int amdgpu_bo_list_create(struct amdgpu_device *adev, struct drm_file *filp,
        list->first_userptr = first_userptr;
      list->num_entries = num_entries;
+    sort(array, last_entry, sizeof(struct amdgpu_bo_list_entry),
+         amdgpu_bo_list_entry_cmp, NULL);

Previously amdgpu_bo_list_get_list sorted all entries, but this one only sorts userptr entries. I think this changes behavior?

The intention here is to sort all entries except the userptrs. Need to double check.


@@ -928,18 +874,56 @@ static int amdgpu_cs_parser_bos(struct amdgpu_cs_parser *p,
          e->user_invalidated = userpage_invalidated;
      }
  -    r = ttm_eu_reserve_buffers(&p->ticket, &p->validated, true,
-                   &duplicates);
-    if (unlikely(r != 0)) {
-        if (r != -ERESTARTSYS)
-            DRM_ERROR("ttm_eu_reserve_buffers failed.\n");
-        goto out_free_user_pages;
+    drm_exec_while_not_all_locked(&p->exec) {
+        r = amdgpu_vm_lock_pd(&fpriv->vm, &p->exec);
+        drm_exec_continue_on_contention(&p->exec);

Duplicate handling is needed for pretty much every call of amdgpu_vm_lock_pd, as bo->tbo.base.resv == vm->root.bo->tbo.base.resv for AMDGPU_GEM_CREATE_VM_ALWAYS_VALID.

Well no. AMDGPU_GEM_CREATE_VM_ALWAYS_VALID means that BOs should *not* be part of the relocation list. So when those cause an EALREADY here then userspace has a bug.

I think Boris's suggestion of having this through a common DRM_EXEC_FLAG_ALLOW_DUPLICATES flag fits well.

No, again. The only driver which should accept duplicates is radeon, for all other drivers especially new ones duplicates should probably be rejected.

We only allow this for radeon because it is already UAPI, could be that we need to do this for amdgpu as well but I really hope we don't need this.


+        if (unlikely(r))
+            goto out_free_user_pages;
+
+        amdgpu_bo_list_for_each_entry(e, p->bo_list) {
+            r = drm_exec_prepare_obj(&p->exec, &e->bo->tbo.base, 2);

Previously there were comments for how the fence count is calculated, now they seem to be removed. I'd prefer if they were properly retained as finding out who calls drm_resv_add_fence is not trivial, and wrong reservation count can also be really hard to debug.

I need to double check this, the reservation count looks incorrect in the first place.


Likewise for amdgpu_vm_lock_pd (which was added in another commit).

+ drm_exec_break_on_contention(&p->exec);
+            if (unlikely(r))
+                goto out_free_user_pages;
+
+            e->bo_va = amdgpu_vm_bo_find(vm, e->bo);
+            e->range = NULL;
+        }
+        drm_exec_continue_on_contention(&p->exec);
+
+        if (p->uf_bo) {
+            r = drm_exec_prepare_obj(&p->exec, &p->uf_bo->tbo.base,
+                         2);
+            drm_exec_continue_on_contention(&p->exec);
+            if (unlikely(r))
+                goto out_free_user_pages;
+        }
      }
  -    amdgpu_bo_list_for_each_entry(e, p->bo_list) {
-        struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
+    amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
+        struct mm_struct *usermm;
  -        e->bo_va = amdgpu_vm_bo_find(vm, bo);
+        usermm = amdgpu_ttm_tt_get_usermm(e->bo->tbo.ttm);
+        if (usermm && usermm != current->mm) {
+            r = -EPERM;
+            goto out_free_user_pages;
+        }
+
+        if (amdgpu_ttm_tt_is_userptr(e->bo->tbo.ttm) &&
+            e->user_invalidated && e->user_pages) {
+            amdgpu_bo_placement_from_domain(e->bo,
+                            AMDGPU_GEM_DOMAIN_CPU);
+            r = ttm_bo_validate(&e->bo->tbo, &e->bo->placement,
+                        &ctx);
+            if (r)
+                goto out_free_user_pages;
+
+            amdgpu_ttm_tt_set_user_pages(e->bo->tbo.ttm,
+                             e->user_pages);
+        }
+
+        kvfree(e->user_pages);
+        e->user_pages = NULL;
      }
        amdgpu_cs_get_threshold_for_moves(p->adev, &p->bytes_moved_threshold, @@ -1296,9 +1271,8 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
       */
      r = 0;
      amdgpu_bo_list_for_each_userptr_entry(e, p->bo_list) {
-        struct amdgpu_bo *bo = ttm_to_amdgpu_bo(e->tv.bo);
-
-        r |= !amdgpu_ttm_tt_get_user_pages_done(bo->tbo.ttm, e->range);
+        r |= !amdgpu_ttm_tt_get_user_pages_done(e->bo->tbo.ttm,
+                            e->range);
          e->range = NULL;

e->range = NULL; needs to be removed, or it's causing *massive* memory leaks.


What? That sounds like nonsense to me amdgpu_ttm_tt_get_user_pages_done() frees the range it gets.


      }
      if (r) {
@@ -1307,20 +1281,22 @@ static int amdgpu_cs_submit(struct amdgpu_cs_parser *p,
      }
        p->fence = dma_fence_get(&leader->base.s_fence->finished);
-    list_for_each_entry(e, &p->validated, tv.head) {
+    drm_exec_for_each_locked_object(&p->exec, index, gobj) {
+
+ ttm_bo_move_to_lru_tail_unlocked(&gem_to_amdgpu_bo(gobj)->tbo);
            /* Everybody except for the gang leader uses READ */
          for (i = 0; i < p->gang_size; ++i) {
              if (p->jobs[i] == leader)
                  continue;
  -            dma_resv_add_fence(e->tv.bo->base.resv,
+            dma_resv_add_fence(gobj->resv,
&p->jobs[i]->base.s_fence->finished,
                         DMA_RESV_USAGE_READ);
          }
  -        /* The gang leader is remembered as writer */
-        e->tv.num_shared = 0;
+        /* The gang leader as remembered as writer */
+        dma_resv_add_fence(gobj->resv, p->fence, DMA_RESV_USAGE_WRITE);
      }

Previously PD used READ accesses, now everything is WRITE. This probably isn't right.

That shouldn't matter. We should switch to using BOOKKEEP at some point, but for now that's irrelevant.

Thanks,
Christian.


Regards,
Tatsuyuki




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux