[AMD Official Use Only] >-----Original Message----- >From: Koenig, Christian <Christian.Koenig@xxxxxxx> >Sent: Thursday, September 23, 2021 10:52 PM >To: Yu, Lang <Lang.Yu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >Cc: Kuehling, Felix <Felix.Kuehling@xxxxxxx>; Huang, Ray ><Ray.Huang@xxxxxxx> >Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning > >Am 23.09.21 um 16:24 schrieb Yu, Lang: >> [AMD Official Use Only] >>> -----Original Message----- >>> From: Koenig, Christian <Christian.Koenig@xxxxxxx> >>> Sent: Thursday, September 23, 2021 8:24 PM >>> To: Yu, Lang <Lang.Yu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>> Cc: Kuehling, Felix <Felix.Kuehling@xxxxxxx>; Christian K nig >>> <C3B6christian.koenig@xxxxxxx>; Huang, Ray <Ray.Huang@xxxxxxx> >>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning >>> >>> Am 23.09.21 um 14:09 schrieb Yu, Lang: >>>> [AMD Official Use Only] >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: Koenig, Christian <Christian.Koenig@xxxxxxx> >>>>> Sent: Thursday, September 23, 2021 7:40 PM >>>>> To: Yu, Lang <Lang.Yu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>>>> Cc: Kuehling, Felix <Felix.Kuehling@xxxxxxx>; Christian K nig >>>>> <C3B6christian.koenig@xxxxxxx>; Huang, Ray <Ray.Huang@xxxxxxx> >>>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning >>>>> >>>>> >>>>> >>>>> Am 23.09.21 um 11:44 schrieb Lang Yu: >>>>>> If a BO is pinned, unpin it before freeing it. >>>>>> >>>>>> Call Trace: >>>>>> ttm_bo_put+0x30/0x50 [ttm] >>>>>> amdgpu_bo_unref+0x1e/0x30 [amdgpu] >>>>>> amdgpu_gem_object_free+0x34/0x50 [amdgpu] >>>>>> drm_gem_object_free+0x1d/0x30 [drm] >>>>>> amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu] >>>>>> kfd_process_device_free_bos+0xa3/0xf0 [amdgpu] >>>>>> kfd_process_wq_release+0x224/0x2e0 [amdgpu] >>>>>> process_one_work+0x220/0x3c0 >>>>>> worker_thread+0x4d/0x3f0 >>>>>> kthread+0x114/0x150 >>>>>> process_one_work+0x3c0/0x3c0 >>>>>> kthread_park+0x90/0x90 >>>>>> ret_from_fork+0x22/0x30 >>>>>> >>>>>> Signed-off-by: Lang Yu <lang.yu@xxxxxxx> >>>>>> --- >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++ >>>>>> 1 file changed, 3 insertions(+) >>>>>> >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>>> index 2d6b2d77b738..7e693b064072 100644 >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>>> @@ -1567,6 +1567,9 @@ int >>> amdgpu_amdkfd_gpuvm_free_memory_of_gpu( >>>>>> pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va, >>>>>> mem->va + bo_size * (1 + mem->aql_queue)); >>>>>> >>>>>> + if (mem->bo->tbo.pin_count) >>>>>> + amdgpu_bo_unpin(mem->bo); >>>>>> + >>>>> NAK, using mem->bo->tbo.pin_count like this is illegal. >>>> I didn't get your point. I referred to function-"void >>>> amdgpu_bo_unpin(struct amdgpu_bo *bo)", which uses it like this. >>> What amdgpu_bo_unpin() does is the following: >>> >>> ttm_bo_unpin(&bo->tbo); >>> if (bo->tbo.pin_count) >>> return; >>> .... >>> >>> In other words we take further actions based on if the buffer us is >>> still pinned or not after an unpin operation. >>> >>> What you try to do here is unpinning the BO when it is pinned >>> independent if somebody else or our code has pinned it before. >> Hi Christian, >> >> Thanks for your explanation and advice. I got your point. >> Actually, these BOs are allocated and pinned during a kfd process lifecycle. >> I will try to add a flag into struct kgd_mem to indicate which BO is >> pined and should be unpinned. Which will make >> amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks! > >That isn't to much better. The real solution would be to unpin them when the kfd >process is destroyed. Yes, will unpin them when the kfd process is destroyed. But we should indicate which BO is pinned and should be unpinned. Right? Regards, Lang >Regards, >Christian. > >> >> Regards, >> Lang >>> That can lead to all kind of problems and is clearly illegal. >>> >>>>> Where was the BO pinned in the first place? >>>> I found two places: >>>> >>>> ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags, >>>> &kaddr); >>>> >>>> ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base, >>>> KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr); >>> Well then you need to figure out where that memory is freed again and >>> place the unpin appropriately. >>> >>> General rule of thumb is that calls to amdgpu_bo_pin/amdgpu_bo_unpin >>> should be balanced. >>> >>> Regards, >>> Christian. >>> >>>> Regards, >>>> Lang >>>> >>>>> Christian. >>>>> >>>>>> ret = unreserve_bo_and_vms(&ctx, false, false); >>>>>> >>>>>> /* Remove from VM internal data structures */