Re: [PATCH v2] drm/amdgpu: fix the memleak caused by fence not released

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2/14/2025 6:09 PM, Christian König wrote:
Yeah, completely agree.

But not checking the syncobj handle before doing the update is actually even more problematic than leaking the memory.

This could be used by userspace to put the kernel into a broken situation it can't come out any more.

Arvin can you take care of the complete fix?
Sure, I will do that.


Thanks,

~Arvind

Thanks,
Christian.

Am 14.02.25 um 13:14 schrieb YuanShang Mao (River):
[AMD Official Use Only - AMD Internal Distribution Only]

Better to put the fence outside amdgpu_gem_va_update_vm. Since it is passed to the caller, and the caller must keep one reference at least until this fence is no longer needed.

Thanks
River

-----Original Message-----
From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Yadav, Arvind
Sent: Friday, February 14, 2025 7:42 PM
To: Koenig, Christian <Christian.Koenig@xxxxxxx>; Ma, Le <Le.Ma@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Yadav, Arvind <Arvind.Yadav@xxxxxxx>
Cc: Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Lazar, Lijo <Lijo.Lazar@xxxxxxx>
Subject: Re: [PATCH v2] drm/amdgpu: fix the memleak caused by fence not released


On 2/14/2025 4:08 PM, Christian König wrote:
Adding Arvind, please make sure to keep him in the loop.

Am 14.02.25 um 11:07 schrieb Le Ma:
On systems with CONFIG_SLUB_DEBUG enabled, the memleak like below
will show up explicitly during driver unloading if created bo without
drm_timeline object before.

      BUG drm_sched_fence (Tainted: G           OE     ): Objects remaining in drm_sched_fence on __kmem_cache_shutdown()
      -----------------------------------------------------------------------------
      Call Trace:
      <TASK>
      dump_stack_lvl+0x4c/0x70
      dump_stack+0x14/0x20
      slab_err+0xb0/0xf0
      ? srso_alias_return_thunk+0x5/0xfbef5
      ? flush_work+0x12/0x20
      ? srso_alias_return_thunk+0x5/0xfbef5
      __kmem_cache_shutdown+0x163/0x2e0
      kmem_cache_destroy+0x61/0x170
      drm_sched_fence_slab_fini+0x19/0x900

Thus call dma_fence_put properly to avoid the memleak.

v2: call dma_fence_put in amdgpu_gem_va_update_vm

Signed-off-by: Le Ma <le.ma@xxxxxxx>
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 9 +++++++--
   1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 8b67aae6c2fe..00f1f34705c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -759,7 +759,8 @@ static struct dma_fence *
   amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
                      struct amdgpu_vm *vm,
                      struct amdgpu_bo_va *bo_va,
-                    uint32_t operation)
+                    uint32_t operation,
+                    uint32_t syncobj_handle)
   {
      struct dma_fence *fence = dma_fence_get_stub();
      int r;
@@ -771,6 +772,9 @@ amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
      if (r)
              goto error;

+    if (!syncobj_handle)
+            dma_fence_put(fence);
+
Having that check inside amdgpu_gem_update_bo_mapping() was actually correct. Here it doesn't make much sense.
Agreed,

Regards,
~Arvind

      if (operation == AMDGPU_VA_OP_MAP ||
          operation == AMDGPU_VA_OP_REPLACE) {
              r = amdgpu_vm_bo_update(adev, bo_va, false); @@ -965,7 +969,8 @@
int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
                                                  &timeline_chain);
Right before this here is a call to amdgpu_gem_update_timeline_node() which is incorrectly placed.

That needs to come much earlier, above the switch (args->operation)....

Regards,
Christian.

              fence = amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
-                                            args->operation);
+                                            args->operation,
+                                            args->vm_timeline_syncobj_out);

              if (!r)
                      amdgpu_gem_update_bo_mapping(filp, bo_va,



[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux