[AMD Official Use Only] >-----Original Message----- >From: Koenig, Christian <Christian.Koenig@xxxxxxx> >Sent: Thursday, September 23, 2021 8:24 PM >To: Yu, Lang <Lang.Yu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >Cc: Kuehling, Felix <Felix.Kuehling@xxxxxxx>; Christian K nig ><C3B6christian.koenig@xxxxxxx>; Huang, Ray <Ray.Huang@xxxxxxx> >Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning > >Am 23.09.21 um 14:09 schrieb Yu, Lang: >> [AMD Official Use Only] >> >> >> >>> -----Original Message----- >>> From: Koenig, Christian <Christian.Koenig@xxxxxxx> >>> Sent: Thursday, September 23, 2021 7:40 PM >>> To: Yu, Lang <Lang.Yu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>> Cc: Kuehling, Felix <Felix.Kuehling@xxxxxxx>; Christian K nig >>> <C3B6christian.koenig@xxxxxxx>; Huang, Ray <Ray.Huang@xxxxxxx> >>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning >>> >>> >>> >>> Am 23.09.21 um 11:44 schrieb Lang Yu: >>>> If a BO is pinned, unpin it before freeing it. >>>> >>>> Call Trace: >>>> ttm_bo_put+0x30/0x50 [ttm] >>>> amdgpu_bo_unref+0x1e/0x30 [amdgpu] >>>> amdgpu_gem_object_free+0x34/0x50 [amdgpu] >>>> drm_gem_object_free+0x1d/0x30 [drm] >>>> amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu] >>>> kfd_process_device_free_bos+0xa3/0xf0 [amdgpu] >>>> kfd_process_wq_release+0x224/0x2e0 [amdgpu] >>>> process_one_work+0x220/0x3c0 >>>> worker_thread+0x4d/0x3f0 >>>> kthread+0x114/0x150 >>>> process_one_work+0x3c0/0x3c0 >>>> kthread_park+0x90/0x90 >>>> ret_from_fork+0x22/0x30 >>>> >>>> Signed-off-by: Lang Yu <lang.yu@xxxxxxx> >>>> --- >>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++ >>>> 1 file changed, 3 insertions(+) >>>> >>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>> index 2d6b2d77b738..7e693b064072 100644 >>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>> @@ -1567,6 +1567,9 @@ int >amdgpu_amdkfd_gpuvm_free_memory_of_gpu( >>>> pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va, >>>> mem->va + bo_size * (1 + mem->aql_queue)); >>>> >>>> + if (mem->bo->tbo.pin_count) >>>> + amdgpu_bo_unpin(mem->bo); >>>> + >>> NAK, using mem->bo->tbo.pin_count like this is illegal. >> I didn't get your point. I referred to function-"void >> amdgpu_bo_unpin(struct amdgpu_bo *bo)", which uses it like this. > >What amdgpu_bo_unpin() does is the following: > > ttm_bo_unpin(&bo->tbo); > if (bo->tbo.pin_count) > return; >.... > >In other words we take further actions based on if the buffer us is still pinned or >not after an unpin operation. > >What you try to do here is unpinning the BO when it is pinned independent if >somebody else or our code has pinned it before. Hi Christian, Thanks for your explanation and advice. I got your point. Actually, these BOs are allocated and pinned during a kfd process lifecycle. I will try to add a flag into struct kgd_mem to indicate which BO is pined and should be unpinned. Which will make amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks! Regards, Lang > >That can lead to all kind of problems and is clearly illegal. > >>> Where was the BO pinned in the first place? >> I found two places: >> >> ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags, >> &kaddr); >> >> ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base, >> KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr); > >Well then you need to figure out where that memory is freed again and place the >unpin appropriately. > >General rule of thumb is that calls to amdgpu_bo_pin/amdgpu_bo_unpin should >be balanced. > >Regards, >Christian. > >> Regards, >> Lang >> >>> Christian. >>> >>>> ret = unreserve_bo_and_vms(&ctx, false, false); >>>> >>>> /* Remove from VM internal data structures */