[AMD Official Use Only - AMD Internal Distribution Only] Better to put the fence outside amdgpu_gem_va_update_vm. Since it is passed to the caller, and the caller must keep one reference at least until this fence is no longer needed. Thanks River -----Original Message----- From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Yadav, Arvind Sent: Friday, February 14, 2025 7:42 PM To: Koenig, Christian <Christian.Koenig@xxxxxxx>; Ma, Le <Le.Ma@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Yadav, Arvind <Arvind.Yadav@xxxxxxx> Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Lazar, Lijo <Lijo.Lazar@xxxxxxx> Subject: Re: [PATCH v2] drm/amdgpu: fix the memleak caused by fence not released On 2/14/2025 4:08 PM, Christian König wrote: > Adding Arvind, please make sure to keep him in the loop. > > Am 14.02.25 um 11:07 schrieb Le Ma: >> On systems with CONFIG_SLUB_DEBUG enabled, the memleak like below >> will show up explicitly during driver unloading if created bo without >> drm_timeline object before. >> >> BUG drm_sched_fence (Tainted: G OE ): Objects remaining in drm_sched_fence on __kmem_cache_shutdown() >> ----------------------------------------------------------------------------- >> Call Trace: >> <TASK> >> dump_stack_lvl+0x4c/0x70 >> dump_stack+0x14/0x20 >> slab_err+0xb0/0xf0 >> ? srso_alias_return_thunk+0x5/0xfbef5 >> ? flush_work+0x12/0x20 >> ? srso_alias_return_thunk+0x5/0xfbef5 >> __kmem_cache_shutdown+0x163/0x2e0 >> kmem_cache_destroy+0x61/0x170 >> drm_sched_fence_slab_fini+0x19/0x900 >> >> Thus call dma_fence_put properly to avoid the memleak. >> >> v2: call dma_fence_put in amdgpu_gem_va_update_vm >> >> Signed-off-by: Le Ma <le.ma@xxxxxxx> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 9 +++++++-- >> 1 file changed, 7 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >> index 8b67aae6c2fe..00f1f34705c0 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c >> @@ -759,7 +759,8 @@ static struct dma_fence * >> amdgpu_gem_va_update_vm(struct amdgpu_device *adev, >> struct amdgpu_vm *vm, >> struct amdgpu_bo_va *bo_va, >> - uint32_t operation) >> + uint32_t operation, >> + uint32_t syncobj_handle) >> { >> struct dma_fence *fence = dma_fence_get_stub(); >> int r; >> @@ -771,6 +772,9 @@ amdgpu_gem_va_update_vm(struct amdgpu_device *adev, >> if (r) >> goto error; >> >> + if (!syncobj_handle) >> + dma_fence_put(fence); >> + > Having that check inside amdgpu_gem_update_bo_mapping() was actually correct. Here it doesn't make much sense. Agreed, Regards, ~Arvind > >> if (operation == AMDGPU_VA_OP_MAP || >> operation == AMDGPU_VA_OP_REPLACE) { >> r = amdgpu_vm_bo_update(adev, bo_va, false); @@ -965,7 +969,8 @@ >> int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data, >> &timeline_chain); > Right before this here is a call to amdgpu_gem_update_timeline_node() which is incorrectly placed. > > That needs to come much earlier, above the switch (args->operation).... > > Regards, > Christian. > >> >> fence = amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va, >> - args->operation); >> + args->operation, >> + args->vm_timeline_syncobj_out); >> >> if (!r) >> amdgpu_gem_update_bo_mapping(filp, bo_va,