Re: [PATCH 3/3] drm/amdgpu: Increase soft recovery timeout to .5s

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 08.03.24 um 23:31 schrieb Joshua Ashton:
It definitely takes much longer than 10-20ms in some instances.

Some of these instances can even be shown in Freidrich's hang test suite -- specifically when there are a lot of page faults going on.

Exactly that's the part I want to avoid. The context based recovery is to break out of shaders with endless loops.

When there are page faults going on I would rather recommend a hard reset of the GPU.


The work (or parts of the work) could also be pending and not in any wave yet, just hanging out in the ring. There may be a better solution to that, but I don't know it.

Yeah, but killing anything of that should never take longer than what the original submission supposed to take.

In other words when we assume that we should have at least 20fps then we should never go over 50ms. And even at this point we have already waited much longer than that for the shader to complete.

If you really want to raise that this high I would rather say to make it configurable.

Regards,
Christian.


Raising it to .5s still makes sense to me.

- Joshie 🐸✨

On 3/8/24 08:29, Christian König wrote:
Am 07.03.24 um 20:04 schrieb Joshua Ashton:
Results in much more reliable soft recovery on
Steam Deck.

Waiting 500ms for a locked up shader is way to long I think. We could increase the 10ms to something like 20ms, but I really wouldn't go much over that.

This here just kills shaders which are in an endless loop, when that takes longer than 10-20ms we really have a hardware problem which needs a full reset to resolve.

Regards,
Christian.


Signed-off-by: Joshua Ashton <joshua@xxxxxxxxx>

Cc: Friedrich Vock <friedrich.vock@xxxxxx>
Cc: Bas Nieuwenhuizen <bas@xxxxxxxxxxxxxxxxxxx>
Cc: Christian König <christian.koenig@xxxxxxx>
Cc: André Almeida <andrealmeid@xxxxxxxxxx>
Cc: stable@xxxxxxxxxxxxxxx
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 57c94901ed0a..be99db0e077e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -448,7 +448,7 @@ bool amdgpu_ring_soft_recovery(struct amdgpu_ring *ring, unsigned int vmid,
      spin_unlock_irqrestore(fence->lock, flags);
      atomic_inc(&ring->adev->gpu_reset_counter);
-    deadline = ktime_add_us(ktime_get(), 10000);
+    deadline = ktime_add_ms(ktime_get(), 500);
      while (!dma_fence_is_signaled(fence) &&
             ktime_to_ns(ktime_sub(deadline, ktime_get())) > 0)
          ring->funcs->soft_recovery(ring, vmid);







[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux