Patch "drm/xe: Don't short circuit TDR on jobs not started" has been added to the 6.11-stable tree

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



This is a note to let you know that I've just added the patch titled

    drm/xe: Don't short circuit TDR on jobs not started

to the 6.11-stable tree which can be found at:
    http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
     drm-xe-don-t-short-circuit-tdr-on-jobs-not-started.patch
and it can be found in the queue-6.11 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@xxxxxxxxxxxxxxx> know about it.



commit 133a844e2efa74fa38eb3be012c531cbefec816a
Author: Matthew Brost <matthew.brost@xxxxxxxxx>
Date:   Fri Oct 25 14:43:29 2024 -0700

    drm/xe: Don't short circuit TDR on jobs not started
    
    [ Upstream commit fe05cee4d9533892210e1ee90147175d87e7c053 ]
    
    Short circuiting TDR on jobs not started is an optimization which is not
    required. On LNL we are facing an issue where jobs do not get scheduled
    by the GuC if it misses a GGTT page update. When this occurs let the TDR
    fire, toggle the scheduling which may get the job unstuck, and print a
    warning message. If the TDR fires twice on job that hasn't started,
    timeout the job.
    
    v2:
     - Add warning message (Paulo)
     - Add fixes tag (Paulo)
     - Timeout job which hasn't started after TDR firing twice
    v3:
     - Include local change
    v4:
     - Short circuit check_timeout on job not started
     - use warn level rather than notice (Paulo)
    
    Fixes: 7ddb9403dd74 ("drm/xe: Sample ctx timestamp to determine if jobs have timed out")
    Cc: stable@xxxxxxxxxxxxxxx
    Cc: Paulo Zanoni <paulo.r.zanoni@xxxxxxxxx>
    Signed-off-by: Matthew Brost <matthew.brost@xxxxxxxxx>
    Reviewed-by: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
    Link: https://patchwork.freedesktop.org/patch/msgid/20241025214330.2010521-2-matthew.brost@xxxxxxxxx
    Signed-off-by: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
    (cherry picked from commit 35d25a4a0012e690ef0cc4c5440231176db595cc)
    Signed-off-by: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
    Signed-off-by: Sasha Levin <sashal@xxxxxxxxxx>

diff --git a/drivers/gpu/drm/xe/xe_guc_submit.c b/drivers/gpu/drm/xe/xe_guc_submit.c
index dfd809e7bbd25..cbdd44567d107 100644
--- a/drivers/gpu/drm/xe/xe_guc_submit.c
+++ b/drivers/gpu/drm/xe/xe_guc_submit.c
@@ -989,12 +989,22 @@ static void xe_guc_exec_queue_lr_cleanup(struct work_struct *w)
 static bool check_timeout(struct xe_exec_queue *q, struct xe_sched_job *job)
 {
 	struct xe_gt *gt = guc_to_gt(exec_queue_to_guc(q));
-	u32 ctx_timestamp = xe_lrc_ctx_timestamp(q->lrc[0]);
-	u32 ctx_job_timestamp = xe_lrc_ctx_job_timestamp(q->lrc[0]);
+	u32 ctx_timestamp, ctx_job_timestamp;
 	u32 timeout_ms = q->sched_props.job_timeout_ms;
 	u32 diff;
 	u64 running_time_ms;
 
+	if (!xe_sched_job_started(job)) {
+		xe_gt_warn(gt, "Check job timeout: seqno=%u, lrc_seqno=%u, guc_id=%d, not started",
+			   xe_sched_job_seqno(job), xe_sched_job_lrc_seqno(job),
+			   q->guc->id);
+
+		return xe_sched_invalidate_job(job, 2);
+	}
+
+	ctx_timestamp = xe_lrc_ctx_timestamp(q->lrc[0]);
+	ctx_job_timestamp = xe_lrc_ctx_job_timestamp(q->lrc[0]);
+
 	/*
 	 * Counter wraps at ~223s at the usual 19.2MHz, be paranoid catch
 	 * possible overflows with a high timeout.
@@ -1120,10 +1130,6 @@ guc_exec_queue_timedout_job(struct drm_sched_job *drm_job)
 		exec_queue_killed_or_banned_or_wedged(q) ||
 		exec_queue_destroyed(q);
 
-	/* Job hasn't started, can't be timed out */
-	if (!skip_timeout_check && !xe_sched_job_started(job))
-		goto rearm;
-
 	/*
 	 * XXX: Sampling timeout doesn't work in wedged mode as we have to
 	 * modify scheduling state to read timestamp. We could read the




[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux