Quoting Tvrtko Ursulin (2018-04-19 11:40:06) > > On 19/04/2018 08:17, Chris Wilson wrote: > > To ease the frequent and ugly pointer dance of > > &request->gem_context->engine[request->engine->id] during request > > submission, store that pointer as request->hw_context. One major > > advantage that we will exploit later is that this decouples the logical > > context state from the engine itself. > > I can see merit in making all the places which work on engines or > requests more logically and elegantly grab the engine specific context. > > Authors of current unmerged work will probably be not so thrilled though. > > @@ -353,60 +346,56 @@ int intel_gvt_scan_and_shadow_workload(struct intel_vgpu_workload *workload) > > struct intel_vgpu_submission *s = &vgpu->submission; > > struct i915_gem_context *shadow_ctx = s->shadow_ctx; > > struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv; > > - int ring_id = workload->ring_id; > > - struct intel_engine_cs *engine = dev_priv->engine[ring_id]; > > - struct intel_ring *ring; > > + struct intel_engine_cs *engine = dev_priv->engine[workload->ring_id]; > > + struct intel_context *ce; > > int ret; > > > > lockdep_assert_held(&dev_priv->drm.struct_mutex); > > > > - if (workload->shadowed) > > + if (workload->req) > > return 0; > > GVT team will need to look at this hunk/function. ... > At which point I skip over all GVT and continue further down. :) That code doesn't merit your attention (or ours really until it is improved, it really is staging/ quality). > > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c > > index 2fe779cab298..ed1c591ee866 100644 > > --- a/drivers/gpu/drm/i915/i915_gem_context.c > > +++ b/drivers/gpu/drm/i915/i915_gem_context.c > > @@ -125,16 +125,10 @@ static void i915_gem_context_free(struct i915_gem_context *ctx) > > i915_ppgtt_put(ctx->ppgtt); > > > > for (i = 0; i < I915_NUM_ENGINES; i++) { > > - struct intel_context *ce = &ctx->engine[i]; > > + struct intel_context *ce = &ctx->__engine[i]; > > > > - if (!ce->state) > > - continue; > > - > > - WARN_ON(ce->pin_count); > > - if (ce->ring) > > - intel_ring_free(ce->ring); > > - > > - __i915_gem_object_release_unless_active(ce->state->obj); > > + if (ce->ops) > > Is ce->ops now the gate which was "if (!ce->state) continue" before, to > avoid GEM_BUG_ON in execlists_context_destroy? I am wondering if this > loop should not just be for_each_engine. Places which walk until > I915_NUM_ENGINES should be very special and this one doesn't feel like > one. s/I915_NUM_ENGINES/ARRAY_SIZE(ctx->__engine)/ That's the semantic intent here. That we aren't just thinking about engines, but the contents of the context struct. We teardown the struct. > > @@ -1010,9 +1018,12 @@ bool intel_engine_has_kernel_context(const struct intel_engine_cs *engine) > > */ > > rq = __i915_gem_active_peek(&engine->timeline->last_request); > > if (rq) > > - return rq->gem_context == kernel_context; > > - else > > - return engine->last_retired_context == kernel_context; > > + return rq->gem_context == kctx; > > + > > + if (!engine->last_retired_context) > > + return false; > > + > > + return engine->last_retired_context->gem_context == kctx; > > You thought this will be a bit more readable/obvious? Hmm, I wrote this before I wrote to_intel_context(). If we use that, it looks like const struct intel_context *kernel_context = to_intel_context(engine->i915->kernel_context, engine); struct i915_request *rq; rq = __i915_gem_active_peek(&engine->timeline->last_request); if (rq) return rq->hw_context == kernel_context; else return engine->last_retired_context == kernel_context; which isn't as bad as the first pass was... > > @@ -511,9 +511,7 @@ static void guc_add_request(struct intel_guc *guc, struct i915_request *rq) > > { > > struct intel_guc_client *client = guc->execbuf_client; > > struct intel_engine_cs *engine = rq->engine; > > - u32 ctx_desc = > > - lower_32_bits(intel_lr_context_descriptor(rq->gem_context, > > - engine)); > > + u32 ctx_desc = lower_32_bits(rq->hw_context->lrc_desc); > > Ha... > > > u32 ring_tail = intel_ring_set_tail(rq->ring, rq->tail) / sizeof(u64); > > > > spin_lock(&client->wq_lock); > > @@ -551,8 +549,8 @@ static void inject_preempt_context(struct work_struct *work) > > preempt_work[engine->id]); > > struct intel_guc_client *client = guc->preempt_client; > > struct guc_stage_desc *stage_desc = __get_stage_desc(client); > > - u32 ctx_desc = lower_32_bits(intel_lr_context_descriptor(client->owner, > > - engine)); > > + u32 ctx_desc = lower_32_bits(to_intel_context(client->owner, > > + engine)->lrc_desc); > > ..hum. Just pointing out that this code is horrible ;) One of the goals of having rq->hw_context was to avoid the pointer dance, such as exemplified by intel_lr_context_descriptor(). But here we should just stick the ctx_desc in the intel_guc_client. > > @@ -708,7 +706,7 @@ static void guc_dequeue(struct intel_engine_cs *engine) > > struct i915_request *rq, *rn; > > > > list_for_each_entry_safe(rq, rn, &p->requests, sched.link) { > > - if (last && rq->gem_context != last->gem_context) { > > + if (last && rq->hw_context != last->hw_context) { > > Same thing really since this is already per engine. Not always... > > if (port == last_port) { > > __list_del_many(&p->requests, > > &rq->sched.link); > > @@ -991,7 +989,8 @@ static void guc_fill_preempt_context(struct intel_guc *guc) > > enum intel_engine_id id; > > > > for_each_engine(engine, dev_priv, id) { > > - struct intel_context *ce = &client->owner->engine[id]; > > + struct intel_context *ce = > > + to_intel_context(client->owner, engine); > > u32 addr = intel_hws_preempt_done_address(engine); > > u32 *cs; > > > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > > index 0777226e65a6..e6725690b5b6 100644 > > --- a/drivers/gpu/drm/i915/intel_lrc.c > > +++ b/drivers/gpu/drm/i915/intel_lrc.c > > @@ -164,7 +164,8 @@ > > #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS) > > > > static int execlists_context_deferred_alloc(struct i915_gem_context *ctx, > > - struct intel_engine_cs *engine); > > + struct intel_engine_cs *engine, > > + struct intel_context *ce); > > Feels a bit redundant to pass in everything but OK. You do remember which patch this was split out of ;) > > { > > return (IS_ENABLED(CONFIG_DRM_I915_GVT) && > > - i915_gem_context_force_single_submission(ctx)); > > + i915_gem_context_force_single_submission(ctx->gem_context)); > > } > > But the whole function change is again not needed I think. > > > > > -static bool can_merge_ctx(const struct i915_gem_context *prev, > > - const struct i915_gem_context *next) > > +static bool can_merge_ctx(const struct intel_context *prev, > > + const struct intel_context *next) > > This one neither. Consider what it means. The rule is that we are not allowed to submit the same lrc descriptor to subsequent ports; that corresponds to the hw context, not gem_context. The payoff, aside from the better semantics now, is in subsequent patches. > > @@ -686,14 +687,14 @@ static void execlists_dequeue(struct intel_engine_cs *engine) > > * the same context (even though a different > > * request) to the second port. > > */ > > - if (ctx_single_port_submission(last->gem_context) || > > - ctx_single_port_submission(rq->gem_context)) { > > + if (ctx_single_port_submission(last->hw_context) || > > + ctx_single_port_submission(rq->hw_context)) { > > __list_del_many(&p->requests, > > &rq->sched.link); > > goto done; > > } > > > > - GEM_BUG_ON(last->gem_context == rq->gem_context); > > + GEM_BUG_ON(last->hw_context == rq->hw_context); > > And if you agree with the above then these two hunks do not have to be > in the patch. Respectfully disagree as this is ground work for subsequent logical contexts that float between engines. > > +static struct intel_context * > > +execlists_context_pin(struct intel_engine_cs *engine, > > + struct i915_gem_context *ctx) > > { > > - struct intel_context *ce = &ctx->engine[engine->id]; > > + struct intel_context *ce = to_intel_context(ctx, engine); > > > > lockdep_assert_held(&ctx->i915->drm.struct_mutex); > > - GEM_BUG_ON(ce->pin_count == 0); > > - > > - if (--ce->pin_count) > > - return; > > > > - intel_ring_unpin(ce->ring); > > + if (likely(ce->pin_count++)) > > + return ce; > > + GEM_BUG_ON(!ce->pin_count); /* no overflow please! */ > > > > - ce->state->obj->pin_global--; > > - i915_gem_object_unpin_map(ce->state->obj); > > - i915_vma_unpin(ce->state); > > + ce->ops = &execlists_context_ops; > > If ops are set on pin it would be good to unset them on destroy. No point on destroy, since it's being destroyed. > Hm, or even set them not on pin but on creation. engine->context_pin() is our intel_context factory, pin is the creation function. Definitely a bit asymmetrical, the key bit was having ce->unpin() so that we didn't try to release the intel_context via a stale engine reference. What I have been considering is passing intel_context to i915_request_alloc() instead and trying to avoid that asymmetry by managing the creation in the caller and separating it from pin(). > Not a huge deal but it is a > bit unbalanced, and also if ce->ops is also a guard to distinguish > possible from impossible engines then it is a bit more strange. Not impossible engines in this regard, unused contexts. > > > > - i915_gem_context_put(ctx); > > + return __execlists_context_pin(engine, ctx, ce); > > } > > > > static int execlists_request_alloc(struct i915_request *request) > > { > > - struct intel_engine_cs *engine = request->engine; > > - struct intel_context *ce = &request->gem_context->engine[engine->id]; > > - int ret; > > + int err; > > > > - GEM_BUG_ON(!ce->pin_count); > > + GEM_BUG_ON(!request->hw_context->pin_count); > > > > /* Flush enough space to reduce the likelihood of waiting after > > * we start building the request - in which case we will just > > @@ -1388,9 +1413,9 @@ static int execlists_request_alloc(struct i915_request *request) > > */ > > request->reserved_space += EXECLISTS_REQUEST_SIZE; > > > > - ret = intel_ring_wait_for_space(request->ring, request->reserved_space); > > - if (ret) > > - return ret; > > + err = intel_ring_wait_for_space(request->ring, request->reserved_space); > > + if (err) > > + return err; > > Rename of ret to err for a smaller diff is avoidable. A couple of lines for trying to harmonise between ret and err. > > @@ -1288,32 +1305,37 @@ intel_ring_context_pin(struct intel_engine_cs *engine, > > > > i915_gem_context_get(ctx); > > > > -out: > > /* One ringbuffer to rule them all */ > > - return engine->buffer; > > + GEM_BUG_ON(!engine->buffer); > > + ce->ring = engine->buffer; > > + > > + return ce; > > > > err: > > ce->pin_count = 0; > > Strictly speaking this should be done in the caller. Sssh. Tail call, think of it as a single contiguous function ;) > > @@ -1323,10 +1345,6 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine) > > > > intel_engine_setup_common(engine); > > > > - err = intel_engine_init_common(engine); > > - if (err) > > - goto err; > > - > > ring = intel_engine_create_ring(engine, 32 * PAGE_SIZE); > > if (IS_ERR(ring)) { > > err = PTR_ERR(ring); > > @@ -1341,8 +1359,14 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine) > > GEM_BUG_ON(engine->buffer); > > engine->buffer = ring; > > > > + err = intel_engine_init_common(engine); > > + if (err) > > + goto err_unpin; > > + > > This ordering change doesn't look like it has to be done in this patch. It does. Do you want to take another look ;) The problem here is the legacy ringbuffer submission uses a common engine->buffer, and the kernel contexts are pinned in init_common; so we have to create that buffer first. > > diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c > > index e6d4b882599a..47a9de741c5f 100644 > > --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c > > +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c > > @@ -243,6 +243,11 @@ struct drm_i915_private *mock_gem_device(void) > > if (!i915->kernel_context) > > goto err_engine; > > > > + /* Fake the initial load for the powersaving context */ > > + i915->engine[RCS]->last_retired_context = > > + intel_context_pin(i915->kernel_context, i915->engine[RCS]); > > + GEM_BUG_ON(IS_ERR_OR_NULL(i915->engine[RCS]->last_retired_context)); > > What is this? Keeping selftests happy. I was asserting engine->last_retired_context was set chasing ce->ops and found an instance where it wasn't which caused this patch to blow up. And I had I written intel_engine_has_kernel_context() differently, this chunk wouldn't have been required. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx