[AMD Official Use Only] >-----Original Message----- >From: Grodzovsky, Andrey <Andrey.Grodzovsky@xxxxxxx> >Sent: Thursday, October 21, 2021 11:18 PM >To: Yu, Lang <Lang.Yu@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >Subject: Re: FW: [PATCH 1/3] drm/amdgpu: fix a potential memory leak in >amdgpu_device_fini_sw() > >On 2021-10-21 3:19 a.m., Yu, Lang wrote: > >> [AMD Official Use Only] >> >> >> >>> -----Original Message----- >>> From: Yu, Lang <Lang.Yu@xxxxxxx> >>> Sent: Thursday, October 21, 2021 3:18 PM >>> To: Grodzovsky, Andrey <Andrey.Grodzovsky@xxxxxxx> >>> Cc: Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Koenig, Christian >>> <Christian.Koenig@xxxxxxx>; Huang, Ray <Ray.Huang@xxxxxxx>; Yu, Lang >>> <Lang.Yu@xxxxxxx> >>> Subject: [PATCH 1/3] drm/amdgpu: fix a potential memory leak in >>> amdgpu_device_fini_sw() >>> >>> amdgpu_fence_driver_sw_fini() should be executed before >>> amdgpu_device_ip_fini(), otherwise fence driver resource won't be >>> properly freed as adev->rings have been tore down. > > >Cam you clarify more where exactly the memleak happens ? > >Andrey See amdgpu_fence_driver_sw_fini(), ring->fence_drv.fences will only be freed when adev->rings[i] is not NULL. void amdgpu_fence_driver_sw_fini(struct amdgpu_device *adev) { unsigned int i, j; for (i = 0; i < AMDGPU_MAX_RINGS; i++) { struct amdgpu_ring *ring = adev->rings[i]; if (!ring || !ring->fence_drv.initialized) continue; if (!ring->no_scheduler) drm_sched_fini(&ring->sched); for (j = 0; j <= ring->fence_drv.num_fences_mask; ++j) dma_fence_put(ring->fence_drv.fences[j]); kfree(ring->fence_drv.fences); ring->fence_drv.fences = NULL; ring->fence_drv.initialized = false; } } If amdgpu_device_ip_fini() is executed before amdgpu_fence_driver_sw_fini(), amdgpu_device_ip_fini() will call gfx_vX_0_sw_fini() then call amdgpu_ring_fini() and set adev->rings[i] to NULL. Nothing will be freed in amdgpu_fence_driver_sw_fini(). ring->fence_drv.fences memory leak happened! void amdgpu_ring_fini(struct amdgpu_ring *ring) { ...... ring->adev->rings[ring->idx] = NULL; } Regards, Lang > > >>> >>> Fixes: 72c8c97b1522 ("drm/amdgpu: Split amdgpu_device_fini into early >>> and late") >>> >>> Signed-off-by: Lang Yu <lang.yu@xxxxxxx> >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> index 41ce86244144..5654c4790773 100644 >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>> @@ -3843,8 +3843,8 @@ void amdgpu_device_fini_hw(struct amdgpu_device >>> *adev) >>> >>> void amdgpu_device_fini_sw(struct amdgpu_device *adev) { >>> - amdgpu_device_ip_fini(adev); >>> amdgpu_fence_driver_sw_fini(adev); >>> + amdgpu_device_ip_fini(adev); >>> release_firmware(adev->firmware.gpu_info_fw); >>> adev->firmware.gpu_info_fw = NULL; >>> adev->accel_working = false; >>> -- >>> 2.25.1