Re: [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Mon, 21 May 2018 17:00:13 +0100





On 21/05/2018 14:22, Lionel Landwerlin wrote:
On 15/05/18 10:05, Tvrtko Ursulin wrote:

On 14/05/2018 16:56, Lionel Landwerlin wrote:
From: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

    struct drm_i915_gem_context_param arg;
    struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
                                                    .instance = 0, };

    memset(&arg, 0, sizeof(arg));
    arg.ctx_id = ctx;
    arg.param = I915_CONTEXT_PARAM_SSEU;
    arg.value = (uintptr_t) &sseu;
    if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
        sseu.packed.subslice_mask = 0;

        drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
    }

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() 
(Lionel)

v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

v5: Validate sseu configuration against the device's capabilities 
(Lionel)

v6: Change context powergating settings through MI_SDM on kernel 
context (Chris)

[snip]


+ intel_sseu_from_device_sseu(&INTEL_INFO(dev_priv)->sseu) :
+        sseu;
+
+    ret = engine->emit_rpcs_config(rq, ctx, actual_sseu);
+    if (ret) {
+        __i915_request_add(rq, true);
+        return ret;
+    }
+
+    /* Queue this switch after all other activity */
+    list_for_each_entry(timeline, &dev_priv->gt.timelines, link) {

This can iterate over gt.active_rings for a shorter walk. See current 
state of engine_has_idle_kernel_context.

+        struct i915_request *prev;
+
+        prev = last_request_on_engine(timeline, engine);
+        if (prev)
+ i915_sw_fence_await_sw_fence_gfp(&rq->submit,
+                             &prev->submit,
+                             I915_FENCE_GFP);
+    }
+
+    __i915_request_add(rq, true);

This is actually the bit I reading the patch for. So this I think is 
much better/safer than previous idling. However one concern, and maybe 
not a concern but just something which needs to be explained in the 
uAPI, is what it means with regards to the point from which the new 
configuration becomes live.

To me it's seems that all context ioctl effects are ordered.
So I would expect any execbuf after this ioctl would have the right 
configuration.
Anything before would have the previous configuration.

If that's your understanind too, I can add that in the documentation.

I guess I am missing what prevents the context which issued the SSEU 
re-configuraiton to run before the re-configuration request executes. 
They are on separate timelines and subsequent execbuf on the issuing 
context won't depend on it.


Could preemption for instance make it not defined enough? Could we, or 
should we, also link this request somehow onto the issuing context 
timeline so it must be first there? Hm, should we use the user context 
instead of the kernel one, and set the highest priority? But would 
have to avoid triggering preemption.

My understanding is that a context can't MI_LRI itself before Gen11.
I haven't tested that on Gen11 though.

I'm afraid that leaves us with only a max priority request tied to all 
the previous requests.
Or somehow keep a pointer to the last request to change the context 
powergating configuration and add a dependency on all new request until 
it's retired?

I lost the train of dependencies here. :) [snip]


+
+        param_sseu.slice_mask = ce->sseu.slice_mask;
+        param_sseu.subslice_mask = ce->sseu.subslice_mask;
+        param_sseu.min_eus_per_subslice = 
ce->sseu.min_eus_per_subslice;
+        param_sseu.max_eus_per_subslice = 
ce->sseu.max_eus_per_subslice;
+
+        if (copy_to_user(u64_to_user_ptr(args->value), &param_sseu,
+                 sizeof(param_sseu)))
+            ret = -EFAULT;
+        break;
+    }
      default:
          ret = -EINVAL;
          break;
@@ -845,7 +980,41 @@ int i915_gem_context_setparam_ioctl(struct 
drm_device *dev, void *data,
                  ctx->sched.priority = priority;
          }
          break;
+    case I915_CONTEXT_PARAM_SSEU:
+        {
+            struct drm_i915_private *dev_priv = to_i915(dev);
+            struct drm_i915_gem_context_param_sseu user_sseu;
+            struct intel_engine_cs *engine;
+            union intel_sseu ctx_sseu;
+
+            if (args->size) {
+                ret = -EINVAL;
+                break;
+            }
+
+            if (copy_from_user(&user_sseu, 
u64_to_user_ptr(args->value),
+                       sizeof(user_sseu))) {
+                ret = -EFAULT;
+                break;
+            }
+
+            engine = intel_engine_lookup_user(dev_priv,
+                              user_sseu.class,
+                              user_sseu.instance);
+            if (!engine) {
+                ret = -EINVAL;
+                break;
+            }
+
+            ret = 
intel_sseu_from_user_sseu(&INTEL_INFO(dev_priv)->sseu,
+                            &user_sseu, &ctx_sseu);

Should setter have a helper as well?

I'm not sure what you're asking here :(
Setter for what?

set_param vs get_param. I only noticed you have helper to conver struct 
sseu from user to kernel version for one direction only. It could be it 
doesn't make sense to have the other, haven't looked that deeply.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx