[PATCH] drm/amdgpu: no job timeout setting on compute queues

Alexander.Deucher@xxxxxxx (Deucher, Alexander) · Fri, 16 Mar 2018 16:14:28 +0000

Since GPU reset is not enabled yet anyway, a timeout will just print a message, can we just change amdgpu_lockup_timeout to MAX_SCHEDULE_TIMEOUT until we enable GPU reset?


Alex

________________________________
From: Evan Quan <evan.quan@xxxxxxx>
Sent: Friday, March 16, 2018 12:52:32 AM
To: amd-gfx at lists.freedesktop.org
Cc: Deucher, Alexander; Quan, Evan
Subject: [PATCH] drm/amdgpu: no job timeout setting on compute queues

Under some heavy computing test(dgemm) environment, it may takes
the asic over 50+ seconds to finish the dispatched single job
which will trigger the timeout. It's quite annoying although it
does not seem to bring any real problems.
As a quick workround, we choose to not enfoce the timeout
setting on compute queues.

Change-Id: I210011a90898617367e897a90e9f8fb2639281a3
Signed-off-by: Evan Quan <evan.quan at amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index 008e198..455a81e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
@@ -435,7 +435,9 @@ int amdgpu_fence_driver_init_ring(struct amdgpu_ring *ring,
         if (ring->funcs->type != AMDGPU_RING_TYPE_KIQ) {
                 r = drm_sched_init(&ring->sched, &amdgpu_sched_ops,
                                    num_hw_submission, amdgpu_job_hang_limit,
-                                  msecs_to_jiffies(amdgpu_lockup_timeout), ring->name);
+                                  (ring->funcs->type == AMDGPU_RING_TYPE_COMPUTE) ?
+                                  MAX_SCHEDULE_TIMEOUT : msecs_to_jiffies(amdgpu_lockup_timeout),
+                                  ring->name);
                 if (r) {
                         DRM_ERROR("Failed to create scheduler on ring %s.\n",
                                   ring->name);
--
2.7.4

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20180316/bc41d07c/attachment.html>