On 14/12/2021 15:07, Tvrtko Ursulin wrote:
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Log engine resets done by the GuC firmware in the similar way it is done
by the execlists backend.
This way we have notion of where the hangs are before the GuC gains
support for proper error capture.
Ping - any interest to log this info?
All there currently is a non-descriptive "[drm] GPU HANG: ecode
12:0:00000000".
Also, will GuC be reporting the reason for the engine reset at any point?
Regards,
Tvrtko
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Cc: Matthew Brost <matthew.brost@xxxxxxxxx>
Cc: John Harrison <John.C.Harrison@xxxxxxxxx>
---
drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c | 12 +++++++++++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 97311119da6f..51512123dc1a 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -11,6 +11,7 @@
#include "gt/intel_context.h"
#include "gt/intel_engine_pm.h"
#include "gt/intel_engine_heartbeat.h"
+#include "gt/intel_engine_user.h"
#include "gt/intel_gpu_commands.h"
#include "gt/intel_gt.h"
#include "gt/intel_gt_clock_utils.h"
@@ -3934,9 +3935,18 @@ static void capture_error_state(struct intel_guc *guc,
{
struct intel_gt *gt = guc_to_gt(guc);
struct drm_i915_private *i915 = gt->i915;
- struct intel_engine_cs *engine = __context_to_physical_engine(ce);
+ struct intel_engine_cs *engine = ce->engine;
intel_wakeref_t wakeref;
+ if (intel_engine_is_virtual(engine)) {
+ drm_notice(&i915->drm, "%s class, engines 0x%x; GuC engine reset\n",
+ intel_engine_class_repr(engine->class),
+ engine->mask);
+ engine = guc_virtual_get_sibling(engine, 0);
+ } else {
+ drm_notice(&i915->drm, "%s GuC engine reset\n", engine->name);
+ }
+
intel_engine_set_hung_context(engine, ce);
with_intel_runtime_pm(&i915->runtime_pm, wakeref)
i915_capture_error_state(gt, engine->mask);