If the IH ring buffer overflows, it's possible that fence signal events were lost. Check each ring for progress to prevent job timeouts/GPU hangs due to the fences staying unsignaled despite the work being done. Cc: Joshua Ashton <joshua@xxxxxxxxx> Cc: Alex Deucher <alexander.deucher@xxxxxxx> Cc: Christian König <christian.koenig@xxxxxxx> Cc: stable@xxxxxxxxxxxxxxx Signed-off-by: Friedrich Vock <friedrich.vock@xxxxxx> --- v2: Set ih->overflow to false after processing fences drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c index f3b0aaf3ebc6..4e061f7741d8 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c @@ -209,6 +209,7 @@ int amdgpu_ih_process(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih) { unsigned int count; u32 wptr; + int i; if (!ih->enabled || adev->shutdown) return IRQ_NONE; @@ -227,6 +228,21 @@ int amdgpu_ih_process(struct amdgpu_device *adev, struct amdgpu_ih_ring *ih) ih->rptr &= ih->ptr_mask; } + /* If the ring buffer overflowed, we might have lost some fence + * signal interrupts. Check if there was any activity so the signal + * doesn't get lost. + */ + if (ih->overflow) { + for (i = 0; i < AMDGPU_MAX_RINGS; ++i) { + struct amdgpu_ring *ring = adev->rings[i]; + + if (!ring || !ring->fence_drv.initialized) + continue; + amdgpu_fence_process(ring); + } + ih->overflow = false; + } + amdgpu_ih_set_rptr(adev, ih); wake_up_all(&ih->wait_process); -- 2.43.0