[AMD Official Use Only - Internal Distribution Only] Hi Christian, Ok, will investigate this more for memory leak. But even I fixed this memory leak this time, it couldn't promise anymore memory leak in future. Memory leak shouldn't cause kernel crush, and couldn't be used anymore. Best wishes Emily Deng >-----Original Message----- >From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx> >Sent: Tuesday, March 30, 2021 4:38 PM >To: Deng, Emily <Emily.Deng@xxxxxxx>; Chen, Jiansong (Simon) ><Jiansong.Chen@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >Subject: Re: [PATCH 6/6] drm/amdgpu: Fix driver unload issue > >Hi Emily, > >as I said add a WARN_ON() and look at the backtrace. > >It could be that the backtrace then just shows the general cleanup functions, >but it is at least a start. > >On the other hand if you only see this sometimes then we have some kind of >race condition and need to dig deeper. > >Christian. > >Am 30.03.21 um 10:19 schrieb Deng, Emily: >> [AMD Official Use Only - Internal Distribution Only] >> >> Hi Christian, >> Yes, I agree both with you. But the issue occurs randomly and in >> unload driver and in fairly low rate. It is hard to debug where is the memory >leak. Could you give some suggestion about how to debug this issue? >> >> >> Best wishes >> Emily Deng >> >> >> >>> -----Original Message----- >>> From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx> >>> Sent: Tuesday, March 30, 2021 3:11 PM >>> To: Deng, Emily <Emily.Deng@xxxxxxx>; Chen, Jiansong (Simon) >>> <Jiansong.Chen@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>> Subject: Re: [PATCH 6/6] drm/amdgpu: Fix driver unload issue >>> >>> Good morning, >>> >>> yes Jiansong is right that patch is really not a good idea. >>> >>> Moving buffers can indeed happen during shutdown while some memory >is >>> still referenced. >>> >>> Just ignoring the move is not the right approach, you need to find >>> out why the memory is moved in the first place. >>> >>> You could add something like WARN_ON(adev->shutdown); >>> >>> Regards, >>> Christian. >>> >>> Am 30.03.21 um 09:05 schrieb Deng, Emily: >>>> [AMD Official Use Only - Internal Distribution Only] >>>> >>>> Hi Jiansong, >>>> It does happen, maybe have the race condition? >>>> >>>> >>>> Best wishes >>>> Emily Deng >>>> >>>> >>>> >>>>> -----Original Message----- >>>>> From: Chen, Jiansong (Simon) <Jiansong.Chen@xxxxxxx> >>>>> Sent: Tuesday, March 30, 2021 2:49 PM >>>>> To: Deng, Emily <Emily.Deng@xxxxxxx>; amd- >gfx@xxxxxxxxxxxxxxxxxxxxx >>>>> Cc: Deng, Emily <Emily.Deng@xxxxxxx> >>>>> Subject: RE: [PATCH 6/6] drm/amdgpu: Fix driver unload issue >>>>> >>>>> [AMD Official Use Only - Internal Distribution Only] >>>>> >>>>> I still wonder how the issue takes place? According to my humble >>>>> knowledge in driver model, the reference count of the kobject for >>>>> the device will not reach zero when there is still some device mem >>>>> access, and shutdown should not happen. >>>>> >>>>> Regards, >>>>> Jiansong >>>>> -----Original Message----- >>>>> From: amd-gfx <amd-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of >>>>> Emily Deng >>>>> Sent: Tuesday, March 30, 2021 12:42 PM >>>>> To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>>>> Cc: Deng, Emily <Emily.Deng@xxxxxxx> >>>>> Subject: [PATCH 6/6] drm/amdgpu: Fix driver unload issue >>>>> >>>>> During driver unloading, don't need to copy mem, or it will >>>>> introduce some call trace, such as when sa_manager is freed, it >>>>> will introduce warn call trace in amdgpu_sa_bo_new. >>>>> >>>>> Signed-off-by: Emily Deng <Emily.Deng@xxxxxxx> >>>>> --- >>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +++ >>>>> 1 file changed, 3 insertions(+) >>>>> >>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>>>> index e00263bcc88b..f0546a489e0d 100644 >>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c >>>>> @@ -317,6 +317,9 @@ int amdgpu_ttm_copy_mem_to_mem(struct >>>>> amdgpu_device *adev, struct dma_fence *fence = NULL; int r = 0; >>>>> >>>>> +if (adev->shutdown) >>>>> +return 0; >>>>> + >>>>> if (!adev->mman.buffer_funcs_enabled) { DRM_ERROR("Trying to move >>>>> memory with ring turned off.\n"); return -EINVAL; >>>>> -- >>>>> 2.25.1 >>>>> >>>>> _______________________________________________ >>>>> amd-gfx mailing list >>>>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fl >>>>> is >>>>> ts.fr >>>>> eedesktop.org%2Fmailman%2Flistinfo%2Famd- >>>>> >>> >gfx&data=04%7C01%7CJiansong.Chen%40amd.com%7C1b4c71d7b96247 >>> >6a367508d8f3362f40%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7 >>> >C637526761354532311%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAw >>> >MDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdat >>> >a=RxRnZW0fmwjKSGMN1nf6kIHRdAPVs9J5OBluDYhR6vQ%3D&reserved >>>>> =0 >>>> _______________________________________________ >>>> amd-gfx mailing list >>>> amd-gfx@xxxxxxxxxxxxxxxxxxxxx >>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fli >>>> st >>>> s.freedesktop.org%2Fmailman%2Flistinfo%2Famd- >>> gfx&data=04%7C01%7CEm >>> >ily.Deng%40amd.com%7Cffacb4715aff4ba4336808d8f34af62d%7C3dd8961fe4 >>> 884e >>> >608e11a82d994e183d%7C0%7C0%7C637526850578585302%7CUnknown%7CT >>> WFpbGZsb3 >>> >d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0% >>> 3D%7 >>> >C1000&sdata=u26JPASmJOF5nkXFSJP89PiUUFehvzf%2B2qxQM%2FgT9Ek >>> %3D& >>>> ;reserved=0 _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx