Hi, Thanks for your patch! On 2023-03-07 02:07, Christian König wrote: > Am 07.03.23 um 08:02 schrieb YuBiao Wang: >> [Why] >> For engines not supporting soft reset, i.e. VCN, there will be a failed >> ib test before mode 1 reset during asic reset. The fences in this case >> are never signaled and next time when we try to free the sa_bo, kernel >> will hang. >> >> [How] >> During pre_asic_reset, driver will clear job fences and afterwards the >> fences' refcount will be reduced to 1. For drm_sched_jobs it will be >> released in job_free_cb, and for non-sched jobs like ib_test, it's meant >> to be released in sa_bo_free but only when the fences are signaled. So So, you're missing a signal for the non-scheduler job fences? >> we have to force signal the non_sched bad job's fence during >> pre_asic_reset or the clear is not complete. Do you want to add a function which does just this (signals non-scheduler job fences) in amdgpu_device_pre_asic_reset(), and resubmit your patch? (There will be code redundancy, but may make the point clearer.) Are we missing to signal non-scheduler job fences on reset altogether? -- Regards, Luben > > Well NAK for now. It looks once more like one of those not very well > thought through changes. > > Luben can you please take a look at this and double check it> > Thanks, > Christian. > >> >> Signed-off-by: YuBiao Wang <YuBiao.Wang@xxxxxxx> >> --- >> drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 ++++ >> 1 file changed, 4 insertions(+) >> >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >> index faff4a3f96e6..2e549bd50990 100644 >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c >> @@ -673,6 +673,7 @@ void amdgpu_fence_driver_clear_job_fences(struct amdgpu_ring *ring) >> { >> int i; >> struct dma_fence *old, **ptr; >> + struct amdgpu_job *job; >> >> for (i = 0; i <= ring->fence_drv.num_fences_mask; i++) { >> ptr = &ring->fence_drv.fences[i]; >> @@ -680,6 +681,9 @@ void amdgpu_fence_driver_clear_job_fences(struct amdgpu_ring *ring) >> if (old && old->ops == &amdgpu_job_fence_ops) { >> RCU_INIT_POINTER(*ptr, NULL); >> dma_fence_put(old); >> + job = container_of(old, struct amdgpu_job, hw_fence); >> + if (!job->base.s_fence && !dma_fence_is_signaled(old)) >> + dma_fence_signal(old); >> } >> } >> } >