Quoting Tvrtko Ursulin (2018-08-14 15:40:58) > From: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > We want to allow userspace to reconfigure the subslice configuration for > its own use case. To do so, we expose a context parameter to allow > adjustment of the RPCS register stored within the context image (and > currently not accessible via LRI). If the context is adjusted before > first use, the adjustment is for "free"; otherwise if the context is > active we flush the context off the GPU (stalling all users) and forcing > the GPU to save the context to memory where we can modify it and so > ensure that the register is reloaded on next execution. > > The overhead of managing additional EU subslices can be significant, > especially in multi-context workloads. Non-GPGPU contexts should > preferably disable the subslices it is not using, and others should > fine-tune the number to match their workload. > > We expose complete control over the RPCS register, allowing > configuration of slice/subslice, via masks packed into a u64 for > simplicity. For example, > > struct drm_i915_gem_context_param arg; > struct drm_i915_gem_context_param_sseu sseu = { .class = 0, > .instance = 0, }; > > memset(&arg, 0, sizeof(arg)); > arg.ctx_id = ctx; > arg.param = I915_CONTEXT_PARAM_SSEU; > arg.value = (uintptr_t) &sseu; > if (drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM, &arg) == 0) { > sseu.packed.subslice_mask = 0; > > drmIoctl(fd, DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM, &arg); > } > > could be used to disable all subslices where supported. > > v2: Fix offset of CTX_R_PWR_CLK_STATE in intel_lr_context_set_sseu() (Lionel) > > v3: Add ability to program this per engine (Chris) > > v4: Move most get_sseu() into i915_gem_context.c (Lionel) > > v5: Validate sseu configuration against the device's capabilities (Lionel) > > v6: Change context powergating settings through MI_SDM on kernel context (Chris) > > v7: Synchronize the requests following a powergating setting change using a global > dependency (Chris) > Iterate timelines through dev_priv.gt.active_rings (Tvrtko) > Disable RPCS configuration setting for non capable users (Lionel/Tvrtko) > > v8: s/union intel_sseu/struct intel_sseu/ (Lionel) > s/dev_priv/i915/ (Tvrtko) > Change uapi class/instance fields to u16 (Tvrtko) > Bump mask fields to 64bits (Lionel) > Don't return EPERM when dynamic sseu is disabled (Tvrtko) > > v9: Import context image into kernel context's ppgtt only when > reconfiguring powergated slice/subslices (Chris) > Use aliasing ppgtt when needed (Michel) > > Tvrtko Ursulin: > > v10: > * Update for upstream changes. > * Request submit needs a RPM reference. > * Reject on !FULL_PPGTT for simplicity. > * Pull out get/set param to helpers for readability and less indent. > * Use i915_request_await_dma_fence in add_global_barrier to skip waits > on the same timeline and avoid GEM_BUG_ON. > * No need to explicitly assign a NULL pointer to engine in legacy mode. > * No need to move gen8_make_rpcs up. > * Factored out global barrier as prep patch. > * Allow to only CAP_SYS_ADMIN if !Gen11. > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=100899 > Issue: https://github.com/intel/media-driver/issues/267 > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@xxxxxxxxx> > Cc: Dmitry Rogozhkin <dmitry.v.rogozhkin@xxxxxxxxx> > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > Cc: Zhipeng Gong <zhipeng.gong@xxxxxxxxx> > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_gem_context.c | 187 +++++++++++++++++++++++- > drivers/gpu/drm/i915/intel_lrc.c | 55 +++++++ > drivers/gpu/drm/i915/intel_ringbuffer.h | 4 + > include/uapi/drm/i915_drm.h | 43 ++++++ > 4 files changed, 288 insertions(+), 1 deletion(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > index 8a12984e7495..6d6220634e9e 100644 > --- a/drivers/gpu/drm/i915/i915_gem_context.c > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > @@ -773,6 +773,91 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data, > return 0; > } > > +static int > +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx, > + struct intel_engine_cs *engine, > + struct intel_sseu sseu) > +{ > + struct drm_i915_private *i915 = ctx->i915; > + struct i915_request *rq; > + struct intel_ring *ring; > + int ret; > + > + lockdep_assert_held(&i915->drm.struct_mutex); > + > + /* Submitting requests etc needs the hw awake. */ > + intel_runtime_pm_get(i915); > + > + i915_retire_requests(i915); ? > + > + /* Now use the RCS to actually reconfigure. */ > + engine = i915->engine[RCS]; ? Modifying registers stored in another engine's context image. > + > + rq = i915_request_alloc(engine, i915->kernel_context); > + if (IS_ERR(rq)) { > + ret = PTR_ERR(rq); > + goto out_put; > + } > + > + ret = engine->emit_rpcs_config(rq, ctx, sseu); It's just an LRI, I'd rather we do it directly unless there's evidence that there will be na explicit rpcs config instruction in future. It just doesn't seem general enough. > + if (ret) > + goto out_add; > + > + /* Queue this switch after all other activity */ Only needs to be after the target ctx. > + list_for_each_entry(ring, &i915->gt.active_rings, active_link) { > + struct i915_request *prev; > + > + prev = last_request_on_engine(ring->timeline, engine); As constructed above you need target-engine + RCS. > + if (prev) > + i915_sw_fence_await_sw_fence_gfp(&rq->submit, > + &prev->submit, > + I915_FENCE_GFP); > + } > + > + i915_gem_set_global_barrier(i915, rq); This is just for a link from ctx-engine to this rq. Overkill much? Presumably this stems from using the wrong engine. > + > +out_add: > + i915_request_add(rq); And I'd still recommend not using indirect access if we can apply the changes immediately. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx