On 12/5/2023 00:52, Nirmoy Das wrote:
gen8_engine_reset_prepare() can fail when HW fails to set
RESET_CTL_READY_TO_RESET bit. In some cases this is not fatal
error as driver will retry.
Convert the log to a trace log for debugging without triggering
unnecessary concerns in CI or for end-users during non-fatal scenarios.
I strongly disagree with this change. The hardware spec for the
RESET_CTL and GDRST registers are that they will self clear within a
matter of microseconds. If something is so badly wrong with the hardware
that it can't even manage to reset then that is something that very much
warrants more than a completely silent trace event. It most certainly
should be flagged as a failure in CI.
Just because the driver will retry does not mean that this is not a
serious error. And if the first attempt failed, why would a subsequent
attempt succeed? Escalating to FLR may have more success, but that is
not something that i915 currently does.
John.
v2: Improve commit message(Tvrtko)
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Cc: John Harrison <John.C.Harrison@xxxxxxxxx>
Cc: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx>
Cc: Andrzej Hajda <andrzej.hajda@xxxxxxxxx>
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/5591
Signed-off-by: Nirmoy Das <nirmoy.das@xxxxxxxxx>
Reviewed-by: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx>
Reviewed-by: Andrzej Hajda <andrzej.hajda@xxxxxxxxx>
---
drivers/gpu/drm/i915/gt/intel_reset.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index d5ed904f355d..e6fbc6202c80 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -593,10 +593,10 @@ static int gen8_engine_reset_prepare(struct intel_engine_cs *engine)
ret = __intel_wait_for_register_fw(uncore, reg, mask, ack,
700, 0, NULL);
if (ret)
- gt_err(engine->gt,
- "%s reset request timed out: {request: %08x, RESET_CTL: %08x}\n",
- engine->name, request,
- intel_uncore_read_fw(uncore, reg));
+ GT_TRACE(engine->gt,
+ "%s reset request timed out: {request: %08x, RESET_CTL: %08x}\n",
+ engine->name, request,
+ intel_uncore_read_fw(uncore, reg));
return ret;
}