Re: [PATCH v5 1/3] drm/xe: Move LNL scheduling WA to xe_device.h

John Harrison <john.c.harrison@xxxxxxxxx> · Fri, 1 Nov 2024 11:48:06 -0700

On 10/29/2024 05:01, Nirmoy Das wrote:
Move LNL scheduling WA to xe_device.h so this can be used in other
places without needing keep the same comment about removal of this WA
in the future. The WA, which flushes work or workqueues, is now wrapped
in macros and can be reused wherever needed.

Cc: Badal Nilawar <badal.nilawar@xxxxxxxxx>
Cc: Matthew Auld <matthew.auld@xxxxxxxxx>
Cc: Matthew Brost <matthew.brost@xxxxxxxxx>
Cc: Himal Prasad Ghimiray <himal.prasad.ghimiray@xxxxxxxxx>
Cc: Lucas De Marchi <lucas.demarchi@xxxxxxxxx>
cc: <stable@xxxxxxxxxxxxxxx> # v6.11+
Suggested-by: John Harrison <John.C.Harrison@xxxxxxxxx>
Signed-off-by: Nirmoy Das <nirmoy.das@xxxxxxxxx>
---
  drivers/gpu/drm/xe/xe_device.h | 14 ++++++++++++++
  drivers/gpu/drm/xe/xe_guc_ct.c | 11 +----------
  2 files changed, 15 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/xe/xe_device.h b/drivers/gpu/drm/xe/xe_device.h
index 4c3f0ebe78a9..f1fbfe916867 100644
--- a/drivers/gpu/drm/xe/xe_device.h
+++ b/drivers/gpu/drm/xe/xe_device.h
@@ -191,4 +191,18 @@ void xe_device_declare_wedged(struct xe_device *xe);
  struct xe_file *xe_file_get(struct xe_file *xef);
  void xe_file_put(struct xe_file *xef);
  
+/*
+ * Occasionally it is seen that the G2H worker starts running after a delay of more than
+ * a second even after being queued and activated by the Linux workqueue subsystem. This
+ * leads to G2H timeout error. The root cause of issue lies with scheduling latency of
+ * Lunarlake Hybrid CPU. Issue disappears if we disable Lunarlake atom cores from BIOS
+ * and this is beyond xe kmd.
+ *
+ * TODO: Drop this change once workqueue scheduling delay issue is fixed on LNL Hybrid CPU.
+ */
+#define LNL_FLUSH_WORKQUEUE(wq__) \
+	flush_workqueue(wq__)
+#define LNL_FLUSH_WORK(wrk__) \
+	flush_work(wrk__)
+
  #endif
diff --git a/drivers/gpu/drm/xe/xe_guc_ct.c b/drivers/gpu/drm/xe/xe_guc_ct.c
index 1b5d8fb1033a..703b44b257a7 100644
--- a/drivers/gpu/drm/xe/xe_guc_ct.c
+++ b/drivers/gpu/drm/xe/xe_guc_ct.c
@@ -1018,17 +1018,8 @@ static int guc_ct_send_recv(struct xe_guc_ct *ct, const u32 *action, u32 len,
  
  	ret = wait_event_timeout(ct->g2h_fence_wq, g2h_fence.done, HZ);
  
-	/*
-	 * Occasionally it is seen that the G2H worker starts running after a delay of more than
-	 * a second even after being queued and activated by the Linux workqueue subsystem. This
-	 * leads to G2H timeout error. The root cause of issue lies with scheduling latency of
-	 * Lunarlake Hybrid CPU. Issue dissappears if we disable Lunarlake atom cores from BIOS
-	 * and this is beyond xe kmd.
-	 *
-	 * TODO: Drop this change once workqueue scheduling delay issue is fixed on LNL Hybrid CPU.
-	 */
  	if (!ret) {
-		flush_work(&ct->g2h_worker);
+		LNL_FLUSH_WORK(&ct->g2h_worker);
  		if (g2h_fence.done) {
  			xe_gt_warn(gt, "G2H fence %u, action %04x, done\n",
  				   g2h_fence.seqno, action[0]);
This message is still wrong.

We have a warning that says 'job completed successfully'! That is 
misleading. It needs to say "done after flush" or "done but flush was 
required" or something along those lines.

John.