Re: [PATCH v6 6/7] drm/i915: Expose RPCS (SSEU) configuration to userspace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 23/05/2018 18:12, Lionel Landwerlin wrote:
On 23/05/18 16:13, Tvrtko Ursulin wrote:

On 22/05/2018 19:00, Lionel Landwerlin wrote:
From: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

    struct drm_i915_gem_context_param arg;
    struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
                                                    .instance = 0, };

    memset(&arg, 0, sizeof(arg));
    arg.ctx_id = ctx;
    arg.param = I915_CONTEXT_PARAM_SSEU;
    arg.value = (uintptr_t) &sseu;
    if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
        sseu.packed.subslice_mask = 0;

        drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
    }

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

v5: Validate sseu configuration against the device's capabilities (Lionel)

v6: Change context powergating settings through MI_SDM on kernel context (Chris)

v7: Synchronize the requests following a powergating setting change using a global
     dependency (Chris)
     Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
     Disable RPCS configuration setting for non capable users (Lionel/Tvrtko)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@xxxxxxxxx>
c: Dmitry Rogozhkin <dmitry.v.rogozhkin@xxxxxxxxx>
CC: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
CC: Zhipeng Gong <zhipeng.gong@xxxxxxxxx>
CC: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>
---
  drivers/gpu/drm/i915/i915_drv.h         |  13 ++
  drivers/gpu/drm/i915/i915_gem.c         |   2 +
  drivers/gpu/drm/i915/i915_gem_context.c | 167 ++++++++++++++++++++++++
  drivers/gpu/drm/i915/i915_request.c     |  20 +++
  drivers/gpu/drm/i915/intel_lrc.c        | 103 ++++++++++-----
  drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
  drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
  include/uapi/drm/i915_drm.h             |  38 ++++++
  8 files changed, 314 insertions(+), 35 deletions(-)

[snip]

  int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
                      struct drm_file *file)
  {
@@ -767,6 +864,37 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
      case I915_CONTEXT_PARAM_PRIORITY:
          args->value = ctx->sched.priority;
          break;
+    case I915_CONTEXT_PARAM_SSEU: {
+        struct drm_i915_gem_context_param_sseu param_sseu;
+        struct intel_engine_cs *engine;
+        struct intel_context *ce;
+
+        if (copy_from_user(&param_sseu, u64_to_user_ptr(args->value),
+                   sizeof(param_sseu))) {
+            ret = -EFAULT;
+            break;
+        }
+
+        engine = intel_engine_lookup_user(to_i915(dev),
+                          param_sseu.class,
+                          param_sseu.instance);
+        if (!engine) {
+            ret = -EINVAL;
+            break;
+        }
+
+        ce = to_intel_context(ctx, engine);
+
+        param_sseu.slice_mask = ce->sseu.slice_mask;
+        param_sseu.subslice_mask = ce->sseu.subslice_mask;
+        param_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice; +        param_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
+
+        if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
+                 sizeof(param_sseu)))
+            ret = -EFAULT;
+        break;

Should we think about maybe not implementing the getter? I mean, is it useful or just code for the driver which could be dropped?

Well, render fd can be transfer between processes, so I think it makes sense to have a way to tell what is the current configuration.

I was thinking that userspace can already get the default configuration via existing get params / topology query.

But if you change the ctx sseu config and pass that fd out its a bit evil. :)

Anyway, no strong feelings to keep it. Was just thinking whether we can save ourselves adding some code.

[snip]

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 7f5634ce8e88..24b90836ce1d 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1456,9 +1456,47 @@ struct drm_i915_gem_context_param {
  #define   I915_CONTEXT_MAX_USER_PRIORITY    1023 /* inclusive */
  #define   I915_CONTEXT_DEFAULT_PRIORITY        0
  #define   I915_CONTEXT_MIN_USER_PRIORITY    -1023 /* inclusive */
+    /*
+     * When using the following param, value should be a pointer to
+     * drm_i915_gem_context_param_sseu.
+     */
+#define I915_CONTEXT_PARAM_SSEU        0x7
      __u64 value;
  };
  +struct drm_i915_gem_context_param_sseu {
+    /*
+     * Engine class & instance to be configured or queried.
+     */
+    __u32 class;
+    __u32 instance;

Chris and I were talking about whether u32 is overkill and we should settle for u16:u16 for class:instance. I think 16-bit should be enough. But it can also be u32, I don't think there are any real downsides here unless we want to be consistent in all places.

Let me know what you think is best.

I'd say u16:u16, Chris?



+
+    /*
+     * Mask of slices to enable for the context. Valid values are a subset
+     * of the bitmask value returned for I915_PARAM_SLICE_MASK.
+     */
+    __u8 slice_mask;
+
+    /*
+     * Mask of subslices to enable for the context. Valid values are a
+     * subset of the bitmask value return by I915_PARAM_SUBSLICE_MASK.
+     */
+    __u8 subslice_mask;

Is this future proof enough, say for Gen11?

As far as I can see, this fits.
No objection to bump it to 16/32bits if you'd like.

Feel like I've asked you this before, sorry - nothing in the future will need per slice subslice mask?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux