[PATCH] drm/sched: fix the bug of time out calculation

Monk Liu <Monk.Liu@xxxxxxx> · Tue, 24 Aug 2021 17:51:32 +0800

the original logic is wrong that the timeout will not be retriggerd
after the previous job siganled, and that lead to the scenario that all
jobs in the same scheduler shares the same timeout timer from the very
begining job in this scheduler which is wrong.

we should modify the timer everytime a previous job signaled.

Signed-off-by: Monk Liu <Monk.Liu@xxxxxxx>
---
 drivers/gpu/drm/scheduler/sched_main.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index a2a9536..fb27025 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -235,6 +235,13 @@ static void drm_sched_start_timeout(struct drm_gpu_scheduler *sched)
 		schedule_delayed_work(&sched->work_tdr, sched->timeout);
 }
 
+static void drm_sched_restart_timeout(struct drm_gpu_scheduler *sched)
+{
+	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
+	    !list_empty(&sched->pending_list))
+		mod_delayed_work(system_wq, &sched->work_tdr, sched->timeout);
+}
+
 /**
  * drm_sched_fault - immediately start timeout handler
  *
@@ -693,6 +700,11 @@ drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
 	if (job && dma_fence_is_signaled(&job->s_fence->finished)) {
 		/* remove job from pending_list */
 		list_del_init(&job->list);
+
+		/* once the job deleted from pending list we should restart
+		 * the timeout calculation for the next job.
+		 */
+		drm_sched_restart_timeout(sched);
 		/* make the scheduled timestamp more accurate */
 		next = list_first_entry_or_null(&sched->pending_list,
 						typeof(*next), list);
-- 
2.7.4