Re: [PATCH v2] drm/amdgpu: fix the memleak caused by fence not released

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Adding Arvind, please make sure to keep him in the loop.

Am 14.02.25 um 11:07 schrieb Le Ma:
> On systems with CONFIG_SLUB_DEBUG enabled, the memleak like below
> will show up explicitly during driver unloading if created bo without
> drm_timeline object before.
>
>     BUG drm_sched_fence (Tainted: G           OE     ): Objects remaining in drm_sched_fence on __kmem_cache_shutdown()
>     -----------------------------------------------------------------------------
>     Call Trace:
>     <TASK>
>     dump_stack_lvl+0x4c/0x70
>     dump_stack+0x14/0x20
>     slab_err+0xb0/0xf0
>     ? srso_alias_return_thunk+0x5/0xfbef5
>     ? flush_work+0x12/0x20
>     ? srso_alias_return_thunk+0x5/0xfbef5
>     __kmem_cache_shutdown+0x163/0x2e0
>     kmem_cache_destroy+0x61/0x170
>     drm_sched_fence_slab_fini+0x19/0x900
>
> Thus call dma_fence_put properly to avoid the memleak.
>
> v2: call dma_fence_put in amdgpu_gem_va_update_vm
>
> Signed-off-by: Le Ma <le.ma@xxxxxxx>
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 9 +++++++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index 8b67aae6c2fe..00f1f34705c0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -759,7 +759,8 @@ static struct dma_fence *
>  amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
>  			struct amdgpu_vm *vm,
>  			struct amdgpu_bo_va *bo_va,
> -			uint32_t operation)
> +			uint32_t operation,
> +			uint32_t syncobj_handle)
>  {
>  	struct dma_fence *fence = dma_fence_get_stub();
>  	int r;
> @@ -771,6 +772,9 @@ amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
>  	if (r)
>  		goto error;
>  
> +	if (!syncobj_handle)
> +		dma_fence_put(fence);
> +

Having that check inside amdgpu_gem_update_bo_mapping() was actually correct. Here it doesn't make much sense.

>  	if (operation == AMDGPU_VA_OP_MAP ||
>  	    operation == AMDGPU_VA_OP_REPLACE) {
>  		r = amdgpu_vm_bo_update(adev, bo_va, false);
> @@ -965,7 +969,8 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
>  						    &timeline_chain);

Right before this here is a call to amdgpu_gem_update_timeline_node() which is incorrectly placed.

That needs to come much earlier, above the switch (args->operation)....

Regards,
Christian.

>  
>  		fence = amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
> -						args->operation);
> +						args->operation,
> +						args->vm_timeline_syncobj_out);
>  
>  		if (!r)
>  			amdgpu_gem_update_bo_mapping(filp, bo_va,




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux