Hi Luben, I was comparing the bad jobs of failed ib test and the ones that causes the TDR, and I think the main difference is whether it is submitted via drm_sched or not. In simple test cases it doesn't seem to incorrectly signal the fences that shouldn't be signaled. We indeed may need more heavier tests but so far based on static analyze I think I didn't notice the case you mentioned. There's another case using direct job submission during resete, but it happens in recover_vram which happens after the pre_asic reset so I think it won’t be affected. I'll move this lines into a new function as you suggested and resent a v2 patch. Regards, Yubiao Wang -----Original Message----- From: Tuikov, Luben <Luben.Tuikov@xxxxxxx> Sent: Wednesday, March 8, 2023 7:22 AM To: Koenig, Christian <Christian.Koenig@xxxxxxx>; Wang, YuBiao <YuBiao.Wang@xxxxxxx>; amd-gfx@xxxxxxxxxxxxxxxxxxxxx Cc: Chen, Horace <Horace.Chen@xxxxxxx>; Deucher, Alexander <Alexander.Deucher@xxxxxxx>; Zhang, Hawking <Hawking.Zhang@xxxxxxx>; Liu, Monk <Monk.Liu@xxxxxxx>; Xu, Feifei <Feifei.Xu@xxxxxxx>; Wang, Yang(Kevin) <KevinYang.Wang@xxxxxxx> Subject: Re: [PATCH] drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs On 2023-03-07 15:36, Luben Tuikov wrote: > + job = container_of(old, struct amdgpu_job, hw_fence); > + if (!job->base.s_fence && !dma_fence_is_signaled(old)) > + dma_fence_signal(old); Thinking about this more, is !job->base.s_fence condition here enough to mean "non-sched jobs like ib_test"? I feel that it is a bit overloaded here--could we have this condition satisfied,yet we can't willy-nilly signal the fence here? -- Regards, Luben