Re: [PATCH v5 7/7] drm/i915: Expose RPCS (SSEU) configuration to userspace

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Wed, 16 May 2018 16:51:50 +0100



Quoting Tvrtko Ursulin (2018-05-16 16:40:55)
> 
> On 15/05/2018 10:05, Tvrtko Ursulin wrote:
> > 
> > On 14/05/2018 16:56, Lionel Landwerlin wrote:
> >> From: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> >>
> >> We want to allow userspace to reconfigure the subslice configuration for
> >> its own use case. To do so, we expose a context parameter to allow
> >> adjustment of the RPCS register stored within the context image (and
> >> currently not accessible via LRI). If the context is adjusted before
> >> first use, the adjustment is for "free"; otherwise if the context is
> >> active we flush the context off the GPU (stalling all users) and forcing
> >> the GPU to save the context to memory where we can modify it and so
> >> ensure that the register is reloaded on next execution.
> >>
> >> The overhead of managing additional EU subslices can be significant,
> >> especially in multi-context workloads. Non-GPGPU contexts should
> >> preferably disable the subslices it is not using, and others should
> >> fine-tune the number to match their workload.
> >>
> >> We expose complete control over the RPCS register, allowing
> >> configuration of slice/subslice, via masks packed into a u64 for
> >> simplicity. For example,
> >>
> >>     struct drm_i915_gem_context_param arg;
> >>     struct drm_i915_gem_context_param_sseu sseu = { .class = 0,
> >>                                                     .instance = 0, };
> >>
> >>     memset(&arg, 0, sizeof(arg));
> >>     arg.ctx_id = ctx;
> >>     arg.param = I915_CONTEXT_PARAM_SSEU;
> >>     arg.value = (uintptr_t) &sseu;
> >>     if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) {
> >>         sseu.packed.subslice_mask = 0;
> >>
> >>         drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg);
> >>     }
> >>
> >> could be used to disable all subslices where supported.
> >>
> >> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() 
> >> (Lionel)
> >>
> >> v3: Add ability to program this per engine (Chris)
> >>
> >> v4: Move most get_sseu() into i915_gem_context.c (Lionel)
> >>
> >> v5: Validate sseu configuration against the device's capabilities 
> >> (Lionel)
> >>
> >> v6: Change context powergating settings through MI_SDM on kernel 
> >> context (Chris)
> >>
> >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899
> >> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> >> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@xxxxxxxxx>
> >> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@xxxxxxxxx>
> >> CC: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
> >> CC: Zhipeng Gong <zhipeng.gong@xxxxxxxxx>
> >> CC: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx>
> >> ---
> >>   drivers/gpu/drm/i915/i915_gem_context.c | 169 ++++++++++++++++++++++++
> >>   drivers/gpu/drm/i915/intel_lrc.c        | 103 ++++++++++-----
> >>   drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
> >>   drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
> >>   include/uapi/drm/i915_drm.h             |  38 ++++++
> >>   5 files changed, 281 insertions(+), 35 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
> >> b/drivers/gpu/drm/i915/i915_gem_context.c
> >> index 01310c99e032..0b72a771c3f3 100644
> >> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> >> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> >> @@ -734,6 +734,110 @@ int i915_gem_context_destroy_ioctl(struct 
> >> drm_device *dev, void *data,
> >>       return 0;
> >>   }
> >> +static int
> >> +intel_sseu_from_user_sseu(const struct sseu_dev_info *sseu,
> >> +              const struct drm_i915_gem_context_param_sseu *user_sseu,
> >> +              union intel_sseu *ctx_sseu)
> >> +{
> >> +    if ((user_sseu->slice_mask & ~sseu->slice_mask) != 0 ||
> >> +        user_sseu->slice_mask == 0)
> >> +        return -EINVAL;
> >> +
> >> +    if ((user_sseu->subslice_mask & ~sseu->subslice_mask[0]) != 0 ||
> >> +        user_sseu->subslice_mask == 0)
> >> +        return -EINVAL;
> >> +
> >> +    if (user_sseu->min_eus_per_subslice > sseu->max_eus_per_subslice)
> >> +        return -EINVAL;
> >> +
> >> +    if (user_sseu->max_eus_per_subslice > sseu->max_eus_per_subslice ||
> >> +        user_sseu->max_eus_per_subslice < 
> >> user_sseu->min_eus_per_subslice ||
> >> +        user_sseu->max_eus_per_subslice == 0)
> >> +        return -EINVAL;
> >> +
> >> +    ctx_sseu->slice_mask = user_sseu->slice_mask;
> >> +    ctx_sseu->subslice_mask = user_sseu->subslice_mask;
> >> +    ctx_sseu->min_eus_per_subslice = user_sseu->min_eus_per_subslice;
> >> +    ctx_sseu->max_eus_per_subslice = user_sseu->max_eus_per_subslice;
> >> +
> >> +    return 0;
> >> +}
> >> +
> >> +static int
> >> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
> >> +                  struct intel_engine_cs *engine,
> >> +                  union intel_sseu sseu)
> >> +{
> >> +    struct drm_i915_private *dev_priv = ctx->i915;
> >> +    struct i915_timeline *timeline;
> >> +    struct i915_request *rq;
> >> +    union intel_sseu actual_sseu;
> >> +    enum intel_engine_id id;
> >> +    int ret;
> >> +
> >> +    /*
> >> +     * First notify user when this capability is not available so 
> >> that it
> >> +     * can be detected with any valid input.
> >> +     */
> >> +    if (!engine->emit_rpcs_config)
> >> +        return -ENODEV;
> >> +
> >> +    if (to_intel_context(ctx, engine)->sseu.value == sseu.value)
> > 
> > Are there other uses for the value union in the series? Think whether it 
> > could be dropped and memcmp used here for simplicity.
> > 
> >> +        return 0;
> >> +
> >> +    lockdep_assert_held(&dev_priv->drm.struct_mutex);
> >> +
> >> +    i915_retire_requests(dev_priv);
> >> +
> >> +    /* Now use the RCS to actually reconfigure. */
> >> +    engine = dev_priv->engine[RCS];
> >> +
> >> +    rq = i915_request_alloc(engine, dev_priv->kernel_context);
> >> +    if (IS_ERR(rq))
> >> +        return PTR_ERR(rq);

[snip]

> >> +    ret = engine->emit_rpcs_config(rq, ctx, actual_sseu);
> >> +    if (ret) {
> >> +        __i915_request_add(rq, true);
> >> +        return ret;
> >> +    }
> >> +
> >> +    /* Queue this switch after all other activity */
> >> +    list_for_each_entry(timeline, &dev_priv->gt.timelines, link) {
> >> +        struct i915_request *prev;
> >> +
> >> +        prev = last_request_on_engine(timeline, engine);
> >> +        if (prev)
> >> +            i915_sw_fence_await_sw_fence_gfp(&rq->submit,
> >> +                             &prev->submit,
> >> +                             I915_FENCE_GFP);
> >> +    }
> >> +
> >> +    __i915_request_add(rq, true);
> > 
> > This is actually the bit I reading the patch for. So this I think is 
> > much better/safer than previous idling. However one concern, and maybe 
> > not a concern but just something which needs to be explained in the 
> > uAPI, is what it means with regards to the point from which the new 
> > configuration becomes live.
> > 
> > Could preemption for instance make it not defined enough? Could we, or 
> > should we, also link this request somehow onto the issuing context 
> > timeline so it must be first there? Hm, should we use the user context 
> > instead of the kernel one, and set the highest priority? But would have 
> > to avoid triggering preemption.

Preemption is a concern with the above code. A barrier does need to be
added from the target context back to the SDM request. What I had in
mind was preferably doing LRI _from_ each context. Then it is naturally
ordered with respect to execution on every context. Failing that, to
allow SDM you need to add a nop request on each context with await on
the sdm request, or we assign the sdm request a U32_MAX priority. (Using
that priority boost would massively mess up the timelines on the system
though, so try to avoid it. ;)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx