Re: [PATCH] drm/kfd: fix ttm_bo_release warning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 24.09.21 um 08:34 schrieb Yu, Lang:
[AMD Official Use Only]



-----Original Message-----
From: Koenig, Christian <Christian.Koenig@xxxxxxx>
Sent: Friday, September 24, 2021 1:54 PM
To: Yu, Lang <Lang.Yu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Kuehling, Felix <Felix.Kuehling@xxxxxxx>; Huang, Ray
<Ray.Huang@xxxxxxx>
Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning


Am 24.09.21 um 07:50 schrieb Yu, Lang:
[AMD Official Use Only]
[SNIP]
Hi Christian,

Thanks for your explanation and advice. I got your point.
Actually, these BOs are allocated and pinned during a kfd process lifecycle.
I will try to add a flag into struct kgd_mem to indicate which BO
is pined and should be unpinned. Which will make
amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks!
That isn't to much better. The real solution would be to unpin them
when the kfd process is destroyed.
Yes, will unpin them when the kfd process is destroyed.
But we should indicate which BO is pinned and should be unpinned. Right?
Well not with a flag or something like that.

The knowledge which BO is pinned and needs to be unpinned should come
from the control logic and not be papered over by some general handling.
That's the background why we have removed the general handling for
this from TTM in the first place.

In other words when you need to pin a BO because it is kmapped it
should be unpinned when it is kunmapped and if you don't kunmap at
all then there is something wrong with the code structure from a higher level
point of view.
Yes, this function "amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel" did a
kmap, but without a kunmap when the kfd process is destroyed. The flag
actually indicates kmap/kunmap.

Well that's the wrong approach then. I mean you need to have the BO reference
and the mapped pointer somewhere, don't you?

How do you clean those up?
They are respectively cleaned by " kfd_process_device_free_bos " and " kfd_process_destroy_pdds".
Let me put the code here. Thanks!

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index ec028cf963f5..d65b3bf13fd8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -81,6 +81,7 @@ struct kgd_mem {

         bool aql_queue;
         bool is_imported;
+       bool is_mapped_to_kernel;

Yeah, that is exactly what you absolutely should NOT do.

  };

  /* KFD Memory Eviction */
@@ -278,6 +279,8 @@ int amdgpu_amdkfd_gpuvm_sync_memory(
                 struct kgd_dev *kgd, struct kgd_mem *mem, bool intr);
  int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,
                 struct kgd_mem *mem, void **kptr, uint64_t *size);

The real question is who is calling this function here?

+void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_dev *kgd,
+               struct kgd_mem *mem);
  int amdgpu_amdkfd_gpuvm_restore_process_bos(void *process_info,
                                             struct dma_fence **ef);
  int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev *kgd,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 2d6b2d77b738..45ccbe9f63ee 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1857,6 +1857,8 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,

         amdgpu_bo_unreserve(bo);

+       mem->is_mapped_to_kernel = true;
+
         mutex_unlock(&mem->process_info->lock);
         return 0;

@@ -1870,6 +1872,20 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd,
         return ret;
  }

+void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_dev *kgd, struct kgd_mem *mem)
+{
+       struct amdgpu_bo *bo = mem->bo;
+
+       if (!mem->is_mapped_to_kernel)
+               return;
+
+       amdgpu_bo_reserve(bo, true);
+       amdgpu_bo_kunmap(bo);
+       amdgpu_bo_unpin(bo);
+       amdgpu_bo_unreserve(bo);
+       mem->is_mapped_to_kernel = false;
+}
+
  int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev *kgd,
                                               struct kfd_vm_fault_info *mem)
  {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 21ec8a18cad2..f5506b153aed 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -941,6 +941,8 @@ static void kfd_process_device_free_bos(struct kfd_process_device *pdd)
                                 peer_pdd->dev->kgd, mem, peer_pdd->drm_priv);
                 }

+               amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(pdd->dev->kgd, mem);
+

That's a general cleanup function for user space allocations and should not be abused for stuff like that.

Regards,
Christian.

                 amdgpu_amdkfd_gpuvm_free_memory_of_gpu(pdd->dev->kgd, mem,
                                                        pdd->drm_priv, NULL);
                 kfd_process_device_remove_obj_handle(pdd, id);

Regards,
Lang

Regards,
Christian.




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux