[AMD Official Use Only] Hi Christian, This part is added by a commit which stated that: When unloading driver after killing some applications, it will hit sdma flush tlb job timeout which is called by ttm_bo_delay_delete. So to avoid the job submit after fence driver fini, call ttm_bo_lock_delayed_workqueue before fence driver fini. And also put drm_sched_fini before waiting fence. As fence driver fini is before amdgpu ip fini process, so I think I shouldn't move it into ttm_fini. Best Regards, Yubiao Wang -----Original Message----- From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx> Sent: Thursday, August 5, 2021 8:36 PM To: Wang, YuBiao <YuBiao.Wang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Grodzovsky, Andrey <Andrey.Grodzovsky@xxxxxxx>; Quan, Evan <Evan.Quan@xxxxxxx>; Chen, Horace <Horace.Chen@xxxxxxx>; Tuikov, Luben <Luben.Tuikov@xxxxxxx>; Koenig, Christian <Christian.Koenig@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Xiao, Jack <Jack.Xiao@xxxxxxx>; Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Liu, Monk <Monk.Liu@xxxxxxx>; Xu, Feifei <Feifei.Xu@xxxxxxx>; Wang, Kevin(Yang) <Kevin1.Wang@xxxxxxx> Subject: Re: [PATCH] drm/amd/amdgpu: skip locking delayed work if not initialized. Am 05.08.21 um 04:37 schrieb YuBiao Wang: > When init failed in early init stage, amdgpu_object has not been > initialized, so hasn't the ttm delayed queue functions. > > Signed-off-by: YuBiao Wang <YuBiao.Wang@xxxxxxx> > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 9e53ff851496..4c33985542ed 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -3825,7 +3825,8 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) > { > dev_info(adev->dev, "amdgpu: finishing device.\n"); > flush_delayed_work(&adev->delayed_init_work); > - ttm_bo_lock_delayed_workqueue(&adev->mman.bdev); > + if (adev->mman.initialized) > + ttm_bo_lock_delayed_workqueue(&adev->mman.bdev); I'm really wondering why we have that here in the first place. This just disabled the delayed delete queue which is part of the sw stack and not related to hardware in any way possible. I think it would be much cleaner to move this into amdgpu_ttm_fini(). Christian. > adev->shutdown = true; > > /* make sure IB test finished before entering exclusive mode