Re: [PATCH] drm/i915: Set wedged if enable guc communication failed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 3/2/2023 1:50 PM, Zhanjun Dong wrote:
Add err code check for enable_communication on resume path. When resume failed, we can no longer use the GPU, marking the GPU as wedged.

This is slightly incorrect. If we fail to enable communication, the consequence is that we can't use the GuC. That translates to a failure to use the GT only if GuC submission is enabled; in execlists submission mode we can keep going, although we might end up losing HuC functionality (which is not considered a fatal error). Therefore, the code below should be updated to reflect this.


Signed-off-by: Zhanjun Dong <zhanjun.dong@xxxxxxxxx>
---
  drivers/gpu/drm/i915/gt/intel_gt_pm.c | 7 ++++++-
  drivers/gpu/drm/i915/gt/uc/intel_uc.c | 9 +++++++--
  2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index cef3d6f5c34e..42a7d75ce39e 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -401,8 +401,13 @@ int intel_gt_runtime_resume(struct intel_gt *gt)
  	intel_ggtt_restore_fences(gt->ggtt);
ret = intel_uc_runtime_resume(&gt->uc);
-	if (ret)
+	if (ret) {
+		/* Resume failed, we can no longer use the GPU, marking the GPU
+		 * as wedged.
+		 */
+		intel_gt_set_wedged(gt);

intel_gt_set_wedged calls intel_runtime_pm_get, so it will deadlock if you call if from within the runtime_resume flow.

  		return ret;
+	}
return 0;
  }
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
index 6648691bd645..d4f428acf20a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
@@ -698,8 +698,13 @@ static int __uc_resume(struct intel_uc *uc, bool enable_communication)
  	/* Make sure we enable communication if and only if it's disabled */
  	GEM_BUG_ON(enable_communication == intel_guc_ct_enabled(&guc->ct));
- if (enable_communication)
-		guc_enable_communication(guc);
+	if (enable_communication) {
+		err = guc_enable_communication(guc);
+		if (err) {
+			guc_dbg(guc, "Failed to resume, %pe", ERR_PTR(err));

nit: this isn't exactly a failure to resume because the GuC is running. It's more a failure to re-establish communication with the GuC.

Daniele

+			return err;
+		}
+	}
/* If we are only resuming GuC communication but not reloading
  	 * GuC, we need to ensure the ARAT timer interrupt is enabled




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux