Re: [PATCH v4 1/5] drm/i915: Add engine reset count in get-reset-stats ioctl

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 21/02/2019 02:58, Carlos Santa wrote:
From: Michel Thierry <michel.thierry@xxxxxxxxx>

Users/tests relying on the total reset count will start seeing a smaller
number since most of the hangs can be handled by engine reset.
Note that if reset engine x, context a running on engine y will be unaware
and unaffected.

To start the discussion, include just a total engine reset count. If it
is deemed useful, it can be extended to report each engine separately.

Our igt's gem_reset_stats test will need changes to ignore the pad field,
since it can now return reset_engine_count.

v2: s/engine_reset/reset_engine/, use union in uapi to not break compatibility.
v3: Keep rejecting attempts to use pad as input (Antonio)
v4: Rebased.
v5: Rebased.

Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx>
Cc: Antonio Argenziano <antonio.argenziano@xxxxxxxxx>
Cc: Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx>
Signed-off-by: Michel Thierry <michel.thierry@xxxxxxxxx>
Signed-off-by: Carlos Santa <carlos.santa@xxxxxxxxx>
---
  drivers/gpu/drm/i915/i915_gem_context.c | 12 ++++++++++--
  include/uapi/drm/i915_drm.h             |  6 +++++-
  2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 459f8eae1c39..cbfe8f2eb3f2 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1889,6 +1889,8 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
  	struct drm_i915_private *dev_priv = to_i915(dev);
  	struct drm_i915_reset_stats *args = data;
  	struct i915_gem_context *ctx;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
  	int ret;
if (args->flags || args->pad)
@@ -1907,10 +1909,16 @@ int i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
  	 * we should wrap the hangstats with a seqlock.
  	 */
- if (capable(CAP_SYS_ADMIN))
+	if (capable(CAP_SYS_ADMIN)) {
  		args->reset_count = i915_reset_count(&dev_priv->gpu_error);
-	else
+		for_each_engine(engine, dev_priv, id)
+			args->reset_engine_count +=
+				i915_reset_engine_count(&dev_priv->gpu_error,
+							engine);

If access to global GPU reset count is privileged, why is access to global engine reset count not? It seems to be fundamentally same level of data leakage.

If we wanted to provide some numbers to unprivileged users I think we would need to store some counters per file_priv/context and return those when !CAP_SYS_ADMIN.

+	} else {
  		args->reset_count = 0;
+		args->reset_engine_count = 0;
+	}
args->batch_active = atomic_read(&ctx->guilty_count);
  	args->batch_pending = atomic_read(&ctx->active_count);
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index cc03ef9f885f..3f2c89740b0e 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1642,7 +1642,11 @@ struct drm_i915_reset_stats {
  	/* Number of batches lost pending for execution, for this context */
  	__u32 batch_pending;
- __u32 pad;
+	union {
+		__u32 pad;
+		/* Engine resets since boot/module reload, for all contexts */
+		__u32 reset_engine_count;
+	};

Chris pointed out in some other review that anonymous unions are not friendly towards C++ compilers.

Not sure what is the best option here. Renaming the field could break old userspace building against newer headers. Is that acceptable?

  };
struct drm_i915_gem_userptr {


Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux