On 04/11/2020 12:30, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2020-11-04 12:20:42)
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Between events which trigger engine and GPU resets and capturing the error
state we lose information on which engine triggered the reset. Improve
this by passing in the hung engine mask down to error capture.
Result is that the list of engines in user visible "GPU HANG: ecode
<gen>:<engines>:<ecode>, <process>" is now a list of hanging and not just
active engines. Most importantly the displayed process is now the one
which was actually hung.
You could also suggest to only include the hanging engine in the report,
as is intended to be the normal means of generating the report
I thought it is potentially useful to have a full picture, but can do
that as well.
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index 0220b0992808..3a7ca90a3436 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -59,6 +59,7 @@ struct i915_request_coredump {
struct intel_engine_coredump {
const struct intel_engine_cs *engine;
+ bool hung;
bool simulated;
u32 reset_count;
@@ -218,8 +219,10 @@ struct drm_i915_error_state_buf {
__printf(2, 3)
void i915_error_printf(struct drm_i915_error_state_buf *e, const char *f, ...);
-struct i915_gpu_coredump *i915_gpu_coredump(struct drm_i915_private *i915);
-void i915_capture_error_state(struct drm_i915_private *i915);
+struct i915_gpu_coredump *i915_gpu_coredump(struct intel_gt *gt,
+ intel_engine_mask_t engine_mask);
+void i915_capture_error_state(struct intel_gt *gt,
+ intel_engine_mask_t engine_mask);
Don't forget the stubs.
Right, thanks.
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx