This is a note to let you know that I've just added the patch titled drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs to the 6.1-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: drm-amdgpu-force-signal-hw_fences-that-are-embedded-.patch and it can be found in the queue-6.1 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. commit 9b2bd985d8d074d8061b433eb03448dd428526a4 Author: YuBiao Wang <YuBiao.Wang@xxxxxxx> Date: Thu Mar 16 11:30:32 2023 +0800 drm/amdgpu: Force signal hw_fences that are embedded in non-sched jobs [ Upstream commit 033c56474acf567a450f8bafca50e0b610f2b716 ] [Why] For engines not supporting soft reset, i.e. VCN, there will be a failed ib test before mode 1 reset during asic reset. The fences in this case are never signaled and next time when we try to free the sa_bo, kernel will hang. [How] During pre_asic_reset, driver will clear job fences and afterwards the fences' refcount will be reduced to 1. For drm_sched_jobs it will be released in job_free_cb, and for non-sched jobs like ib_test, it's meant to be released in sa_bo_free but only when the fences are signaled. So we have to force signal the non_sched bad job's fence during pre_asic_reset or the clear is not complete. Signed-off-by: YuBiao Wang <YuBiao.Wang@xxxxxxx> Acked-by: Luben Tuikov <luben.tuikov@xxxxxxx> Signed-off-by: Alex Deucher <alexander.deucher@xxxxxxx> Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c index 6fdb679321d0d..3cc1929285fc0 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c @@ -624,6 +624,15 @@ void amdgpu_fence_driver_clear_job_fences(struct amdgpu_ring *ring) ptr = &ring->fence_drv.fences[i]; old = rcu_dereference_protected(*ptr, 1); if (old && old->ops == &amdgpu_job_fence_ops) { + struct amdgpu_job *job; + + /* For non-scheduler bad job, i.e. failed ib test, we need to signal + * it right here or we won't be able to track them in fence_drv + * and they will remain unsignaled during sa_bo free. + */ + job = container_of(old, struct amdgpu_job, hw_fence); + if (!job->base.s_fence && !dma_fence_is_signaled(old)) + dma_fence_signal(old); RCU_INIT_POINTER(*ptr, NULL); dma_fence_put(old); }