[AMD Official Use Only] >-----Original Message----- >From: Koenig, Christian <Christian.Koenig@xxxxxxx> >Sent: Friday, September 24, 2021 1:54 PM >To: Yu, Lang <Lang.Yu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >Cc: Kuehling, Felix <Felix.Kuehling@xxxxxxx>; Huang, Ray ><Ray.Huang@xxxxxxx> >Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning > > >Am 24.09.21 um 07:50 schrieb Yu, Lang: >> [AMD Official Use Only] >>> [SNIP] >>>>>> Hi Christian, >>>>>> >>>>>> Thanks for your explanation and advice. I got your point. >>>>>> Actually, these BOs are allocated and pinned during a kfd process lifecycle. >>>>>> I will try to add a flag into struct kgd_mem to indicate which BO >>>>>> is pined and should be unpinned. Which will make >>>>>> amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks! >>>>> That isn't to much better. The real solution would be to unpin them >>>>> when the kfd process is destroyed. >>>> Yes, will unpin them when the kfd process is destroyed. >>>> But we should indicate which BO is pinned and should be unpinned. Right? >>> Well not with a flag or something like that. >>> >>> The knowledge which BO is pinned and needs to be unpinned should come >>> from the control logic and not be papered over by some general handling. >>> That's the background why we have removed the general handling for >>> this from TTM in the first place. >>> >>> In other words when you need to pin a BO because it is kmapped it >>> should be unpinned when it is kunmapped and if you don't kunmap at >>> all then there is something wrong with the code structure from a higher level >point of view. >> Yes, this function "amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel" did a >> kmap, but without a kunmap when the kfd process is destroyed. The flag >actually indicates kmap/kunmap. > >Well that's the wrong approach then. I mean you need to have the BO reference >and the mapped pointer somewhere, don't you? > >How do you clean those up? They are respectively cleaned by " kfd_process_device_free_bos " and " kfd_process_destroy_pdds". Let me put the code here. Thanks! diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h index ec028cf963f5..d65b3bf13fd8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h @@ -81,6 +81,7 @@ struct kgd_mem { bool aql_queue; bool is_imported; + bool is_mapped_to_kernel; }; /* KFD Memory Eviction */ @@ -278,6 +279,8 @@ int amdgpu_amdkfd_gpuvm_sync_memory( struct kgd_dev *kgd, struct kgd_mem *mem, bool intr); int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd, struct kgd_mem *mem, void **kptr, uint64_t *size); +void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_dev *kgd, + struct kgd_mem *mem); int amdgpu_amdkfd_gpuvm_restore_process_bos(void *process_info, struct dma_fence **ef); int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev *kgd, diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 2d6b2d77b738..45ccbe9f63ee 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c @@ -1857,6 +1857,8 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd, amdgpu_bo_unreserve(bo); + mem->is_mapped_to_kernel = true; + mutex_unlock(&mem->process_info->lock); return 0; @@ -1870,6 +1872,20 @@ int amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel(struct kgd_dev *kgd, return ret; } +void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct kgd_dev *kgd, struct kgd_mem *mem) +{ + struct amdgpu_bo *bo = mem->bo; + + if (!mem->is_mapped_to_kernel) + return; + + amdgpu_bo_reserve(bo, true); + amdgpu_bo_kunmap(bo); + amdgpu_bo_unpin(bo); + amdgpu_bo_unreserve(bo); + mem->is_mapped_to_kernel = false; +} + int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct kgd_dev *kgd, struct kfd_vm_fault_info *mem) { diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c b/drivers/gpu/drm/amd/amdkfd/kfd_process.c index 21ec8a18cad2..f5506b153aed 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c @@ -941,6 +941,8 @@ static void kfd_process_device_free_bos(struct kfd_process_device *pdd) peer_pdd->dev->kgd, mem, peer_pdd->drm_priv); } + amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(pdd->dev->kgd, mem); + amdgpu_amdkfd_gpuvm_free_memory_of_gpu(pdd->dev->kgd, mem, pdd->drm_priv, NULL); kfd_process_device_remove_obj_handle(pdd, id); Regards, Lang >Regards, >Christian.