On Fri, Jun 13, 2014 at 04:37:59PM +0100, oscar.mateo@xxxxxxxxx wrote: > From: Oscar Mateo <oscar.mateo@xxxxxxxxx> > > In the current Execlists feeding mechanism, full preemption is not > supported yet: only lite-restores are allowed (this is: the GPU > simply samples a new tail pointer for the context currently in > execution). > > But we have identified an scenario in which a full preemption occurs: > 1) We submit two contexts for execution (A & B). > 2) The GPU finishes with the first one (A), switches to the second one > (B) and informs us. > 3) We submit B again (hoping to cause a lite restore) together with C, > but in the time we spend writing to the ELSP, the GPU finishes B. > 4) The GPU start executing B again (since we told it so). > 5) We receive a B finished interrupt and, mistakenly, we submit C (again) > and D, causing a full preemption of B. > > By keeping a better track of our submissions, we can avoid the scenario > described above. How? I don't see a way to fundamentally avoid the above race, and I don't really see an issue with it - the gpu should notice that there's not really any work done and then switch to C. Or am I completely missing the point here? With no clue at all this looks really scary. > v2: elsp_submitted belongs in the new intel_ctx_submit_request. Several > rebase changes. > > Signed-off-by: Oscar Mateo <oscar.mateo@xxxxxxxxx> > --- > drivers/gpu/drm/i915/intel_lrc.c | 28 ++++++++++++++++++++++++---- > drivers/gpu/drm/i915/intel_lrc.h | 2 ++ > 2 files changed, 26 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c > index 290391c..f388b28 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.c > +++ b/drivers/gpu/drm/i915/intel_lrc.c > @@ -248,6 +248,7 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring) > else if (req0->ctx == cursor->ctx) { > /* Same ctx: ignore first request, as second request > * will update tail past first request's workload */ > + cursor->elsp_submitted = req0->elsp_submitted; > list_del(&req0->execlist_link); > queue_work(dev_priv->wq, &req0->work); > req0 = cursor; > @@ -257,8 +258,14 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring) > } > } > > + WARN_ON(req1 && req1->elsp_submitted); > + > BUG_ON(execlists_submit_context(ring, req0->ctx, req0->tail, > req1? req1->ctx : NULL, req1? req1->tail : 0)); Aside: No BUG_ON except when you can prove that the kernel will die within the current function anyway. I've seen too many cases where people sprinkle BUG_ON instead of WARN_ON for not-completely-letal issues with the argument that stopping the box helps debugging. That's kinda true for initial development, but not true when shipping: The usual result is a frustrated user/customer looking at a completely frozen box (because someone managed to hit the BUG_ON within a spinlock that the irq handler requires and then the machine is gone) and an equally frustrated developer half a world away. A dying kernel that spews useful crap into logs with his last breadth is _much_ better, even when you know that there's no way we can ever recover from a given situation. </rant> Cheers, Daniel > + > + req0->elsp_submitted++; > + if (req1) > + req1->elsp_submitted++; > } > > static bool execlists_check_remove_request(struct intel_engine_cs *ring, > @@ -275,9 +282,13 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring, > struct drm_i915_gem_object *ctx_obj = > head_req->ctx->engine[ring->id].obj; > if (intel_execlists_ctx_id(ctx_obj) == request_id) { > - list_del(&head_req->execlist_link); > - queue_work(dev_priv->wq, &head_req->work); > - return true; > + WARN(head_req->elsp_submitted == 0, > + "Never submitted head request\n"); > + if (--head_req->elsp_submitted <= 0) { > + list_del(&head_req->execlist_link); > + queue_work(dev_priv->wq, &head_req->work); > + return true; > + } > } > } > > @@ -310,7 +321,16 @@ void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring) > status_id = I915_READ(RING_CONTEXT_STATUS_BUF(ring) + > (read_pointer % 6) * 8 + 4); > > - if (status & GEN8_CTX_STATUS_COMPLETE) { > + if (status & GEN8_CTX_STATUS_PREEMPTED) { > + if (status & GEN8_CTX_STATUS_LITE_RESTORE) { > + if (execlists_check_remove_request(ring, status_id)) > + WARN(1, "Lite Restored request removed from queue\n"); > + } else > + WARN(1, "Preemption without Lite Restore\n"); > + } > + > + if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) || > + (status & GEN8_CTX_STATUS_ELEMENT_SWITCH)) { > if (execlists_check_remove_request(ring, status_id)) > submit_contexts++; > } > diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h > index 7949dff..ee877aa 100644 > --- a/drivers/gpu/drm/i915/intel_lrc.h > +++ b/drivers/gpu/drm/i915/intel_lrc.h > @@ -51,6 +51,8 @@ struct intel_ctx_submit_request { > > struct list_head execlist_link; > struct work_struct work; > + > + int elsp_submitted; > }; > > void intel_execlists_handle_ctx_events(struct intel_engine_cs *ring); > -- > 1.9.0 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > http://lists.freedesktop.org/mailman/listinfo/intel-gfx -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx