Re: [PATCH v4 1/5] drm/i915: Add engine reset count in get-reset-stats ioctl

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 06/03/2019 23:08, Carlos Santa wrote:
On Mon, 2019-02-25 at 13:34 +0000, Tvrtko Ursulin wrote:
On 21/02/2019 02:58, Carlos Santa wrote:
From: Michel Thierry <michel.thierry@xxxxxxxxx>

Users/tests relying on the total reset count will start seeing a
smaller
number since most of the hangs can be handled by engine reset.
Note that if reset engine x, context a running on engine y will be
unaware
and unaffected.

To start the discussion, include just a total engine reset count.
If it
is deemed useful, it can be extended to report each engine
separately.

Our igt's gem_reset_stats test will need changes to ignore the pad
field,
since it can now return reset_engine_count.

v2: s/engine_reset/reset_engine/, use union in uapi to not break
compatibility.
v3: Keep rejecting attempts to use pad as input (Antonio)
v4: Rebased.
v5: Rebased.

Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx>
Cc: Antonio Argenziano <antonio.argenziano@xxxxxxxxx>
Cc: Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx>
Signed-off-by: Michel Thierry <michel.thierry@xxxxxxxxx>
Signed-off-by: Carlos Santa <carlos.santa@xxxxxxxxx>
---
   drivers/gpu/drm/i915/i915_gem_context.c | 12 ++++++++++--
   include/uapi/drm/i915_drm.h             |  6 +++++-
   2 files changed, 15 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
b/drivers/gpu/drm/i915/i915_gem_context.c
index 459f8eae1c39..cbfe8f2eb3f2 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1889,6 +1889,8 @@ int i915_gem_context_reset_stats_ioctl(struct
drm_device *dev,
   	struct drm_i915_private *dev_priv = to_i915(dev);
   	struct drm_i915_reset_stats *args = data;
   	struct i915_gem_context *ctx;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
   	int ret;
if (args->flags || args->pad)
@@ -1907,10 +1909,16 @@ int
i915_gem_context_reset_stats_ioctl(struct drm_device *dev,
   	 * we should wrap the hangstats with a seqlock.
   	 */
- if (capable(CAP_SYS_ADMIN))
+	if (capable(CAP_SYS_ADMIN)) {
   		args->reset_count = i915_reset_count(&dev_priv-
gpu_error);
-	else
+		for_each_engine(engine, dev_priv, id)
+			args->reset_engine_count +=
+				i915_reset_engine_count(&dev_priv-
gpu_error,
+							engine);

If access to global GPU reset count is privileged, why is access to
global engine reset count not? It seems to be fundamentally same
level
of data leakage.

But access to global engine reset count (i915_reset_engine_count) is
indeed priviledged. They both are inside if(CAP_SYS_ADMIN){...}, or
maybe I am missing something?

Looks like I misread the diff, sorry. Been processing a lot of patches lately.

Regards,

Tvrtko


If we wanted to provide some numbers to unprivileged users I think
we
would need to store some counters per file_priv/context and return
those
when !CAP_SYS_ADMIN.

The question would be why access to global GPU reset count is
priviledged then? I can't think of a reason why it should be
priviledged. I think the new counter (per engine) should fall in the
same category as the global GPU reset one, right? So, can we make them
both unpriviledged?



+	} else {
   		args->reset_count = 0;
+		args->reset_engine_count = 0;
+	}
args->batch_active = atomic_read(&ctx->guilty_count);
   	args->batch_pending = atomic_read(&ctx->active_count);
diff --git a/include/uapi/drm/i915_drm.h
b/include/uapi/drm/i915_drm.h
index cc03ef9f885f..3f2c89740b0e 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1642,7 +1642,11 @@ struct drm_i915_reset_stats {
   	/* Number of batches lost pending for execution, for this
context */
   	__u32 batch_pending;
- __u32 pad;
+	union {
+		__u32 pad;
+		/* Engine resets since boot/module reload, for all
contexts */
+		__u32 reset_engine_count;
+	};

Chris pointed out in some other review that anonymous unions are not
friendly towards C++ compilers.

Not sure what is the best option here. Renaming the field could
break
old userspace building against newer headers. Is that acceptable?


I dug up some old comments from Chris and he stated that recycling the
union like that would be a bad idea since that would make the pad field
an output only parameter thus invalidating gem_reset_stats...

Why can't we simply add a new field __u32 reset_engine_count; as part
of the drm_i915_reset_stats struct?

Regards,
Carlos

   };
struct drm_i915_gem_userptr {


Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux