[AMD Official Use Only] >-----Original Message----- >From: Koenig, Christian <Christian.Koenig@xxxxxxx> >Sent: Friday, September 24, 2021 1:42 PM >To: Yu, Lang <Lang.Yu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >Cc: Kuehling, Felix <Felix.Kuehling@xxxxxxx>; Huang, Ray ><Ray.Huang@xxxxxxx> >Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning > >Am 24.09.21 um 07:35 schrieb Yu, Lang: >> [AMD Official Use Only] >>> -----Original Message----- >>> From: Koenig, Christian <Christian.Koenig@xxxxxxx> >>> Sent: Thursday, September 23, 2021 10:52 PM >>> To: Yu, Lang <Lang.Yu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>> Cc: Kuehling, Felix <Felix.Kuehling@xxxxxxx>; Huang, Ray >>> <Ray.Huang@xxxxxxx> >>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning >>> >>> Am 23.09.21 um 16:24 schrieb Yu, Lang: >>>> [AMD Official Use Only] >>>>> -----Original Message----- >>>>> From: Koenig, Christian <Christian.Koenig@xxxxxxx> >>>>> Sent: Thursday, September 23, 2021 8:24 PM >>>>> To: Yu, Lang <Lang.Yu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>>>> Cc: Kuehling, Felix <Felix.Kuehling@xxxxxxx>; Christian K nig >>>>> <C3B6christian.koenig@xxxxxxx>; Huang, Ray <Ray.Huang@xxxxxxx> >>>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning >>>>> >>>>> Am 23.09.21 um 14:09 schrieb Yu, Lang: >>>>>> [AMD Official Use Only] >>>>>> >>>>>> >>>>>> >>>>>>> -----Original Message----- >>>>>>> From: Koenig, Christian <Christian.Koenig@xxxxxxx> >>>>>>> Sent: Thursday, September 23, 2021 7:40 PM >>>>>>> To: Yu, Lang <Lang.Yu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>>>>>> Cc: Kuehling, Felix <Felix.Kuehling@xxxxxxx>; Christian K nig >>>>>>> <C3B6christian.koenig@xxxxxxx>; Huang, Ray <Ray.Huang@xxxxxxx> >>>>>>> Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning >>>>>>> >>>>>>> >>>>>>> >>>>>>> Am 23.09.21 um 11:44 schrieb Lang Yu: >>>>>>>> If a BO is pinned, unpin it before freeing it. >>>>>>>> >>>>>>>> Call Trace: >>>>>>>> ttm_bo_put+0x30/0x50 [ttm] >>>>>>>> amdgpu_bo_unref+0x1e/0x30 [amdgpu] >>>>>>>> amdgpu_gem_object_free+0x34/0x50 [amdgpu] >>>>>>>> drm_gem_object_free+0x1d/0x30 [drm] >>>>>>>> amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 >[amdgpu] >>>>>>>> kfd_process_device_free_bos+0xa3/0xf0 [amdgpu] >>>>>>>> kfd_process_wq_release+0x224/0x2e0 [amdgpu] >>>>>>>> process_one_work+0x220/0x3c0 >>>>>>>> worker_thread+0x4d/0x3f0 >>>>>>>> kthread+0x114/0x150 >>>>>>>> process_one_work+0x3c0/0x3c0 >>>>>>>> kthread_park+0x90/0x90 >>>>>>>> ret_from_fork+0x22/0x30 >>>>>>>> >>>>>>>> Signed-off-by: Lang Yu <lang.yu@xxxxxxx> >>>>>>>> --- >>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++ >>>>>>>> 1 file changed, 3 insertions(+) >>>>>>>> >>>>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>>>>> index 2d6b2d77b738..7e693b064072 100644 >>>>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c >>>>>>>> @@ -1567,6 +1567,9 @@ int >>>>> amdgpu_amdkfd_gpuvm_free_memory_of_gpu( >>>>>>>> pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va, >>>>>>>> mem->va + bo_size * (1 + mem->aql_queue)); >>>>>>>> >>>>>>>> + if (mem->bo->tbo.pin_count) >>>>>>>> + amdgpu_bo_unpin(mem->bo); >>>>>>>> + >>>>>>> NAK, using mem->bo->tbo.pin_count like this is illegal. >>>>>> I didn't get your point. I referred to function-"void >>>>>> amdgpu_bo_unpin(struct amdgpu_bo *bo)", which uses it like this. >>>>> What amdgpu_bo_unpin() does is the following: >>>>> >>>>> ttm_bo_unpin(&bo->tbo); >>>>> if (bo->tbo.pin_count) >>>>> return; >>>>> .... >>>>> >>>>> In other words we take further actions based on if the buffer us is >>>>> still pinned or not after an unpin operation. >>>>> >>>>> What you try to do here is unpinning the BO when it is pinned >>>>> independent if somebody else or our code has pinned it before. >>>> Hi Christian, >>>> >>>> Thanks for your explanation and advice. I got your point. >>>> Actually, these BOs are allocated and pinned during a kfd process lifecycle. >>>> I will try to add a flag into struct kgd_mem to indicate which BO is >>>> pined and should be unpinned. Which will make >>>> amdgpu_bo_pin/amdgpu_bo_unpin calls balanced. Thanks! >>> That isn't to much better. The real solution would be to unpin them >>> when the kfd process is destroyed. >> Yes, will unpin them when the kfd process is destroyed. >> But we should indicate which BO is pinned and should be unpinned. Right? > >Well not with a flag or something like that. > >The knowledge which BO is pinned and needs to be unpinned should come from >the control logic and not be papered over by some general handling. >That's the background why we have removed the general handling for this from >TTM in the first place. > >In other words when you need to pin a BO because it is kmapped it should be >unpinned when it is kunmapped and if you don't kunmap at all then there is >something wrong with the code structure from a higher level point of view. Yes, this function "amdgpu_amdkfd_gpuvm_map_gtt_bo_to_kernel" did a kmap, but without a kunmap when the kfd process is destroyed. The flag actually indicates kmap/kunmap. Regards, Lang >Regards, >Christian. > >> >> Regards, >> Lang >> >>> Regards, >>> Christian. >>> >>>> Regards, >>>> Lang >>>>> That can lead to all kind of problems and is clearly illegal. >>>>> >>>>>>> Where was the BO pinned in the first place? >>>>>> I found two places: >>>>>> >>>>>> ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags, >>>>>> &kaddr); >>>>>> >>>>>> ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base, >>>>>> KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr); >>>>> Well then you need to figure out where that memory is freed again >>>>> and place the unpin appropriately. >>>>> >>>>> General rule of thumb is that calls to >>>>> amdgpu_bo_pin/amdgpu_bo_unpin should be balanced. >>>>> >>>>> Regards, >>>>> Christian. >>>>> >>>>>> Regards, >>>>>> Lang >>>>>> >>>>>>> Christian. >>>>>>> >>>>>>>> ret = unreserve_bo_and_vms(&ctx, false, false); >>>>>>>> >>>>>>>> /* Remove from VM internal data structures */