Re: [PATCH 8/8] drm/i915: Expose RPCS (SSEU) configuration to userspace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 14/08/2018 15:59, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2018-08-14 15:40:58)
From: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>

We want to allow userspace to reconfigure the subslice configuration for
its own use case. To do so, we expose a context parameter to allow
adjustment of the RPCS register stored within the context image (and
currently not accessible via LRI). If the context is adjusted before
first use, the adjustment is for "free"; otherwise if the context is
active we flush the context off the GPU (stalling all users) and forcing
the GPU to save the context to memory where we can modify it and so
ensure that the register is reloaded on next execution.

The overhead of managing additional EU subslices can be significant,
especially in multi-context workloads. Non-GPGPU contexts should
preferably disable the subslices it is not using, and others should
fine-tune the number to match their workload.

We expose complete control over the RPCS register, allowing
configuration of slice/subslice, via masks packed into a u64 for
simplicity. For example,

         struct drm_i915_gem_context_param arg;
         struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
                                                         .instance = 0, };

         memset(&arg, 0, sizeof(arg));
         arg.ctx_id = ctx;
         arg.param = I915_CONTEXT_PARAM_SSEU;
         arg.value = (uintptr_t) &sseu;
         if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
                 sseu.packed.subslice_mask = 0;

                 drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
         }

could be used to disable all subslices where supported.

v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel)

v3: Add ability to program this per engine (Chris)

v4: Move most get_sseu() into i915_gem_context.c (Lionel)

v5: Validate sseu configuration against the device's capabilities (Lionel)

v6: Change context powergating settings through MI_SDM on kernel context (Chris)

v7: Synchronize the requests following a powergating setting change using a global
     dependency (Chris)
     Iterate timelines through dev_priv.gt.active_rings (Tvrtko)
     Disable RPCS configuration setting for non capable users (Lionel/Tvrtko)

v8: s/union intel_sseu/struct intel_sseu/ (Lionel)
     s/dev_priv/i915/ (Tvrtko)
     Change uapi class/instance fields to u16 (Tvrtko)
     Bump mask fields to 64bits (Lionel)
     Don't return EPERM when dynamic sseu is disabled (Tvrtko)

v9: Import context image into kernel context's ppgtt only when
     reconfiguring powergated slice/subslices (Chris)
     Use aliasing ppgtt when needed (Michel)

Tvrtko Ursulin:

v10:
  * Update for upstream changes.
  * Request submit needs a RPM reference.
  * Reject on !FULL_PPGTT for simplicity.
  * Pull out get/set param to helpers for readability and less indent.
  * Use i915_request_await_dma_fence in add_global_barrier to skip waits
    on the same timeline and avoid GEM_BUG_ON.
  * No need to explicitly assign a NULL pointer to engine in legacy mode.
  * No need to move gen8_make_rpcs up.
  * Factored out global barrier as prep patch.
  * Allow to only CAP_SYS_ADMIN if !Gen11.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
Issue: https://github.com/intel/media-driver/issues/267
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@xxxxxxxxx>
Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@xxxxxxxxx>
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Cc: Zhipeng Gong <zhipeng.gong@xxxxxxxxx>
Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
---
  drivers/gpu/drm/i915/i915_gem_context.c | 187 +++++++++++++++++++++++-
  drivers/gpu/drm/i915/intel_lrc.c        |  55 +++++++
  drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
  include/uapi/drm/i915_drm.h             |  43 ++++++
  4 files changed, 288 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 8a12984e7495..6d6220634e9e 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -773,6 +773,91 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
         return 0;
  }
+static int
+i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
+                                 struct intel_engine_cs *engine,
+                                 struct intel_sseu sseu)
+{
+       struct drm_i915_private *i915 = ctx->i915;
+       struct i915_request *rq;
+       struct intel_ring *ring;
+       int ret;
+
+       lockdep_assert_held(&i915->drm.struct_mutex);
+
+       /* Submitting requests etc needs the hw awake. */
+       intel_runtime_pm_get(i915);
+
+       i915_retire_requests(i915);

?

I wondered myself but did not make myself dig through all the history of the series. Cannot think that it does anything useful in the current design.

+
+       /* Now use the RCS to actually reconfigure. */
+       engine = i915->engine[RCS];

? Modifying registers stored in another engine's context image.

Well, I was hoping design was kind of agreed by now.

+
+       rq = i915_request_alloc(engine, i915->kernel_context);
+       if (IS_ERR(rq)) {
+               ret = PTR_ERR(rq);
+               goto out_put;
+       }
+
+       ret = engine->emit_rpcs_config(rq, ctx, sseu);

It's just an LRI, I'd rather we do it directly unless there's evidence
that there will be na explicit rpcs config instruction in future. It
just doesn't seem general enough.

No complaints, can do.

+       if (ret)
+               goto out_add;
+
+       /* Queue this switch after all other activity */

Only needs to be after the target ctx.

True, so find just the timeline belonging to target context. Some backpointers would be needed to find it. Or a walk and compare against target ce->ring->timeline->fence_context. Sounds like more than the latter isn't justified for this use case.


+       list_for_each_entry(ring, &i915->gt.active_rings, active_link) {
+               struct i915_request *prev;
+
+               prev = last_request_on_engine(ring->timeline, engine);

As constructed above you need target-engine + RCS.

Target engine is always RCS. Looks like the engine param to this function is pointless.


+               if (prev)
+                       i915_sw_fence_await_sw_fence_gfp(&rq->submit,
+                                                        &prev->submit,
+                                                        I915_FENCE_GFP);
+       }
+
+       i915_gem_set_global_barrier(i915, rq);

This is just for a link from ctx-engine to this rq. Overkill much?
Presumably this stems from using the wrong engine.

AFAIU this is to ensure target context cannot get ahead of this request. Without it could overtake and then there is no guarantee execbufs following set param will be with new SSEU configuration.

+
+out_add:
+       i915_request_add(rq);

And I'd still recommend not using indirect access if we can apply the
changes immediately.

Direct (CPU) access means blocking in set param until the context is idle. AFAIR in one of the earlier version GPU idling approach was used but I thought that is very dangerous API. Can we somehow take just one context out of circulation? Still have to idle since it could be deep in the queue. The current approach is nicely async. And I don't like having two paths - direct if unpinned, or via GPU if pinned. In fact it would even be a third path since normally it happens via init_reg_state of pristine contexts.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux