Re: [PATCH] drm/kfd: fix ttm_bo_release warning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 23.09.21 um 16:24 schrieb Yu, Lang:
[AMD Official Use Only]
-----Original Message-----
From: Koenig, Christian <Christian.Koenig@xxxxxxx>
Sent: Thursday, September 23, 2021 8:24 PM
To: Yu, Lang <Lang.Yu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Kuehling, Felix <Felix.Kuehling@xxxxxxx>; Christian K nig
<C3B6christian.koenig@xxxxxxx>; Huang, Ray <Ray.Huang@xxxxxxx>
Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning

Am 23.09.21 um 14:09 schrieb Yu, Lang:
[AMD Official Use Only]



-----Original Message-----
From: Koenig, Christian <Christian.Koenig@xxxxxxx>
Sent: Thursday, September 23, 2021 7:40 PM
To: Yu, Lang <Lang.Yu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx
Cc: Kuehling, Felix <Felix.Kuehling@xxxxxxx>; Christian K nig
<C3B6christian.koenig@xxxxxxx>; Huang, Ray <Ray.Huang@xxxxxxx>
Subject: Re: [PATCH] drm/kfd: fix ttm_bo_release warning



Am 23.09.21 um 11:44 schrieb Lang Yu:
If a BO is pinned, unpin it before freeing it.

Call Trace:
	ttm_bo_put+0x30/0x50 [ttm]
	amdgpu_bo_unref+0x1e/0x30 [amdgpu]
	amdgpu_gem_object_free+0x34/0x50 [amdgpu]
	drm_gem_object_free+0x1d/0x30 [drm]
	amdgpu_amdkfd_gpuvm_free_memory_of_gpu+0x31f/0x3a0 [amdgpu]
	kfd_process_device_free_bos+0xa3/0xf0 [amdgpu]
	kfd_process_wq_release+0x224/0x2e0 [amdgpu]
	process_one_work+0x220/0x3c0
	worker_thread+0x4d/0x3f0
	kthread+0x114/0x150
	process_one_work+0x3c0/0x3c0
	kthread_park+0x90/0x90
	ret_from_fork+0x22/0x30

Signed-off-by: Lang Yu <lang.yu@xxxxxxx>
---
    drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +++
    1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 2d6b2d77b738..7e693b064072 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -1567,6 +1567,9 @@ int
amdgpu_amdkfd_gpuvm_free_memory_of_gpu(
    	pr_debug("Release VA 0x%llx - 0x%llx\n", mem->va,
    		mem->va + bo_size * (1 + mem->aql_queue));

+	if (mem->bo->tbo.pin_count)
+		amdgpu_bo_unpin(mem->bo);
+
NAK, using mem->bo->tbo.pin_count like this is illegal.
I didn't  get your point. I referred to function-"void
amdgpu_bo_unpin(struct amdgpu_bo *bo)", which uses it like this.
What amdgpu_bo_unpin() does is the following:

         ttm_bo_unpin(&bo->tbo);
         if (bo->tbo.pin_count)
                 return;
....

In other words we take further actions based on if the buffer us is still pinned or
not after an unpin operation.

What you try to do here is unpinning the BO when it is pinned independent if
somebody else or our code has pinned it before.
Hi Christian,

Thanks for your explanation and advice. I got your point.
Actually, these BOs are allocated and pinned during a kfd process lifecycle.
I will try to add a flag into struct kgd_mem to indicate which BO is pined
and should be unpinned. Which will make amdgpu_bo_pin/amdgpu_bo_unpin
calls balanced. Thanks!

That isn't to much better. The real solution would be to unpin them when the kfd process is destroyed.

Regards,
Christian.


Regards,
Lang
That can lead to all kind of problems and is clearly illegal.

Where was the BO pinned in the first place?
I found two places:

	ret = kfd_process_alloc_gpuvm(pdd, qpd->ib_base, PAGE_SIZE, flags,
				      &kaddr);

	ret = kfd_process_alloc_gpuvm(pdd, qpd->cwsr_base,
				      KFD_CWSR_TBA_TMA_SIZE, flags, &kaddr);
Well then you need to figure out where that memory is freed again and place the
unpin appropriately.

General rule of thumb is that calls to amdgpu_bo_pin/amdgpu_bo_unpin should
be balanced.

Regards,
Christian.

Regards,
Lang

Christian.

    	ret = unreserve_bo_and_vms(&ctx, false, false);

    	/* Remove from VM internal data structures */




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux