Quoting Tvrtko Ursulin (2018-05-16 16:40:55) > > On 15/05/2018 10:05, Tvrtko Ursulin wrote: > > > > On 14/05/2018 16:56, Lionel Landwerlin wrote: > >> From: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >> > >> We want to allow userspace to reconfigure the subslice configuration for > >> its own use case. To do so, we expose a context parameter to allow > >> adjustment of the RPCS register stored within the context image (and > >> currently not accessible via LRI). If the context is adjusted before > >> first use, the adjustment is for "free"; otherwise if the context is > >> active we flush the context off the GPU (stalling all users) and forcing > >> the GPU to save the context to memory where we can modify it and so > >> ensure that the register is reloaded on next execution. > >> > >> The overhead of managing additional EU subslices can be significant, > >> especially in multi-context workloads. Non-GPGPU contexts should > >> preferably disable the subslices it is not using, and others should > >> fine-tune the number to match their workload. > >> > >> We expose complete control over the RPCS register, allowing > >> configuration of slice/subslice, via masks packed into a u64 for > >> simplicity. For example, > >> > >> struct drm_i915_gem_context_param arg; > >> struct drm_i915_gem_context_param_sseu sseu = { .class = 0, > >> .instance = 0, }; > >> > >> memset(&arg, 0, sizeof(arg)); > >> arg.ctx_id = ctx; > >> arg.param = I915_CONTEXT_PARAM_SSEU; > >> arg.value = (uintptr_t) &sseu; > >> if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) { > >> sseu.packed.subslice_mask = 0; > >> > >> drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg); > >> } > >> > >> could be used to disable all subslices where supported. > >> > >> v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() > >> (Lionel) > >> > >> v3: Add ability to program this per engine (Chris) > >> > >> v4: Move most get_sseu() into i915_gem_context.c (Lionel) > >> > >> v5: Validate sseu configuration against the device's capabilities > >> (Lionel) > >> > >> v6: Change context powergating settings through MI_SDM on kernel > >> context (Chris) > >> > >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899 > >> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >> Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@xxxxxxxxx> > >> c: Dmitry Rogozhkin <dmitry.v.rogozhkin@xxxxxxxxx> > >> CC: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > >> CC: Zhipeng Gong <zhipeng.gong@xxxxxxxxx> > >> CC: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > >> --- > >> drivers/gpu/drm/i915/i915_gem_context.c | 169 ++++++++++++++++++++++++ > >> drivers/gpu/drm/i915/intel_lrc.c | 103 ++++++++++----- > >> drivers/gpu/drm/i915/intel_ringbuffer.c | 2 + > >> drivers/gpu/drm/i915/intel_ringbuffer.h | 4 + > >> include/uapi/drm/i915_drm.h | 38 ++++++ > >> 5 files changed, 281 insertions(+), 35 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c > >> b/drivers/gpu/drm/i915/i915_gem_context.c > >> index 01310c99e032..0b72a771c3f3 100644 > >> --- a/drivers/gpu/drm/i915/i915_gem_context.c > >> +++ b/drivers/gpu/drm/i915/i915_gem_context.c > >> @@ -734,6 +734,110 @@ int i915_gem_context_destroy_ioctl(struct > >> drm_device *dev, void *data, > >> return 0; > >> } > >> +static int > >> +intel_sseu_from_user_sseu(const struct sseu_dev_info *sseu, > >> + const struct drm_i915_gem_context_param_sseu *user_sseu, > >> + union intel_sseu *ctx_sseu) > >> +{ > >> + if ((user_sseu->slice_mask & ~sseu->slice_mask) != 0 || > >> + user_sseu->slice_mask == 0) > >> + return -EINVAL; > >> + > >> + if ((user_sseu->subslice_mask & ~sseu->subslice_mask[0]) != 0 || > >> + user_sseu->subslice_mask == 0) > >> + return -EINVAL; > >> + > >> + if (user_sseu->min_eus_per_subslice > sseu->max_eus_per_subslice) > >> + return -EINVAL; > >> + > >> + if (user_sseu->max_eus_per_subslice > sseu->max_eus_per_subslice || > >> + user_sseu->max_eus_per_subslice < > >> user_sseu->min_eus_per_subslice || > >> + user_sseu->max_eus_per_subslice == 0) > >> + return -EINVAL; > >> + > >> + ctx_sseu->slice_mask = user_sseu->slice_mask; > >> + ctx_sseu->subslice_mask = user_sseu->subslice_mask; > >> + ctx_sseu->min_eus_per_subslice = user_sseu->min_eus_per_subslice; > >> + ctx_sseu->max_eus_per_subslice = user_sseu->max_eus_per_subslice; > >> + > >> + return 0; > >> +} > >> + > >> +static int > >> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx, > >> + struct intel_engine_cs *engine, > >> + union intel_sseu sseu) > >> +{ > >> + struct drm_i915_private *dev_priv = ctx->i915; > >> + struct i915_timeline *timeline; > >> + struct i915_request *rq; > >> + union intel_sseu actual_sseu; > >> + enum intel_engine_id id; > >> + int ret; > >> + > >> + /* > >> + * First notify user when this capability is not available so > >> that it > >> + * can be detected with any valid input. > >> + */ > >> + if (!engine->emit_rpcs_config) > >> + return -ENODEV; > >> + > >> + if (to_intel_context(ctx, engine)->sseu.value == sseu.value) > > > > Are there other uses for the value union in the series? Think whether it > > could be dropped and memcmp used here for simplicity. > > > >> + return 0; > >> + > >> + lockdep_assert_held(&dev_priv->drm.struct_mutex); > >> + > >> + i915_retire_requests(dev_priv); > >> + > >> + /* Now use the RCS to actually reconfigure. */ > >> + engine = dev_priv->engine[RCS]; > >> + > >> + rq = i915_request_alloc(engine, dev_priv->kernel_context); > >> + if (IS_ERR(rq)) > >> + return PTR_ERR(rq); [snip] > >> + ret = engine->emit_rpcs_config(rq, ctx, actual_sseu); > >> + if (ret) { > >> + __i915_request_add(rq, true); > >> + return ret; > >> + } > >> + > >> + /* Queue this switch after all other activity */ > >> + list_for_each_entry(timeline, &dev_priv->gt.timelines, link) { > >> + struct i915_request *prev; > >> + > >> + prev = last_request_on_engine(timeline, engine); > >> + if (prev) > >> + i915_sw_fence_await_sw_fence_gfp(&rq->submit, > >> + &prev->submit, > >> + I915_FENCE_GFP); > >> + } > >> + > >> + __i915_request_add(rq, true); > > > > This is actually the bit I reading the patch for. So this I think is > > much better/safer than previous idling. However one concern, and maybe > > not a concern but just something which needs to be explained in the > > uAPI, is what it means with regards to the point from which the new > > configuration becomes live. > > > > Could preemption for instance make it not defined enough? Could we, or > > should we, also link this request somehow onto the issuing context > > timeline so it must be first there? Hm, should we use the user context > > instead of the kernel one, and set the highest priority? But would have > > to avoid triggering preemption. Preemption is a concern with the above code. A barrier does need to be added from the target context back to the SDM request. What I had in mind was preferably doing LRI _from_ each context. Then it is naturally ordered with respect to execution on every context. Failing that, to allow SDM you need to add a nop request on each context with await on the sdm request, or we assign the sdm request a U32_MAX priority. (Using that priority boost would massively mess up the timelines on the system though, so try to avoid it. ;) -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx