Re: [PATCH v2] drm/amdkfd: make sure VM is ready for updating operations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Looks valid to me of hand, but it's really Felix who needs to judge this.

On the other hand if it blocks any CI feel free to add my acked-by and submit it.

Christian.

Am 16.04.24 um 04:05 schrieb Yu, Lang:
[Public]

ping

-----Original Message-----
From: Yu, Lang <Lang.Yu@xxxxxxx>
Sent: Thursday, April 11, 2024 4:11 PM
To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Koenig, Christian <Christian.Koenig@xxxxxxx>; Kuehling, Felix
<Felix.Kuehling@xxxxxxx>; Yu, Lang <Lang.Yu@xxxxxxx>
Subject: [PATCH v2] drm/amdkfd: make sure VM is ready for updating
operations

When page table BOs were evicted but not validated before updating page
tables, VM is still in evicting state, amdgpu_vm_update_range returns -EBUSY
and restore_process_worker runs into a dead loop.

v2: Split the BO validation and page table update into two separate loops in
amdgpu_amdkfd_restore_process_bos. (Felix)
  1.Validate BOs
  2.Validate VM (and DMABuf attachments)
  3.Update page tables for the BOs validated above

Fixes: 2fdba514ad5a ("drm/amdgpu: Auto-validate DMABuf imports in
compute VMs")

Signed-off-by: Lang Yu <Lang.Yu@xxxxxxx>
---
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  | 34 +++++++++++----
----
1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 0ae9fd844623..e2c9e6ddb1d1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2900,13 +2900,12 @@ int
amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
__rcu *

       amdgpu_sync_create(&sync_obj);

-      /* Validate BOs and map them to GPUVM (update VM page tables).
*/
+      /* Validate BOs managed by KFD */
       list_for_each_entry(mem, &process_info->kfd_bo_list,
                           validate_list) {

               struct amdgpu_bo *bo = mem->bo;
               uint32_t domain = mem->domain;
-              struct kfd_mem_attachment *attachment;
               struct dma_resv_iter cursor;
               struct dma_fence *fence;

@@ -2931,6 +2930,25 @@ int
amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
__rcu *
                               goto validate_map_fail;
                       }
               }
+      }
+
+      if (failed_size)
+              pr_debug("0x%lx/0x%lx in system\n", failed_size, total_size);
+
+      /* Validate PDs, PTs and evicted DMABuf imports last. Otherwise BO
+       * validations above would invalidate DMABuf imports again.
+       */
+      ret = process_validate_vms(process_info, &exec.ticket);
+      if (ret) {
+              pr_debug("Validating VMs failed, ret: %d\n", ret);
+              goto validate_map_fail;
+      }
+
+      /* Update mappings managed by KFD. */
+      list_for_each_entry(mem, &process_info->kfd_bo_list,
+                          validate_list) {
+              struct kfd_mem_attachment *attachment;
+
               list_for_each_entry(attachment, &mem->attachments, list) {
                       if (!attachment->is_mapped)
                               continue;
@@ -2947,18 +2965,6 @@ int
amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence
__rcu *
               }
       }

-      if (failed_size)
-              pr_debug("0x%lx/0x%lx in system\n", failed_size, total_size);
-
-      /* Validate PDs, PTs and evicted DMABuf imports last. Otherwise BO
-       * validations above would invalidate DMABuf imports again.
-       */
-      ret = process_validate_vms(process_info, &exec.ticket);
-      if (ret) {
-              pr_debug("Validating VMs failed, ret: %d\n", ret);
-              goto validate_map_fail;
-      }
-
       /* Update mappings not managed by KFD */
       list_for_each_entry(peer_vm, &process_info->vm_list_head,
                       vm_list_node) {
--
2.25.1




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux