On Thu, Oct 08, 2015 at 01:32:07PM +0100, Chris Wilson wrote: > On Tue, Oct 06, 2015 at 03:52:01PM +0100, Nick Hoath wrote: > > There is a desire to simplify the i915 driver by reducing the number of > > different code paths introduced by the LRC / execlists support. As the > > execlists request is now part of the gem request it is possible and > > desirable to unify the request life-cycles for execlist and legacy > > requests. > > > > Added a context complete flag to a request which gets set during the > > context switch interrupt. > > Wrong approach. Legacy uses the request itself as the indicator for when > the context is complete, if you were to take the same approach for LRC > we would not need to add yet more execlist specific state. > > Lines of code is reduced by keeping the GEM semantics the same and just > having execlists hold an indepedent ref on the request for its async > submission approach. > > > Added a function i915_gem_request_retireable(). A request is considered > > retireable if its seqno passed (i.e. the request has completed) and either > > it was never submitted to the ELSP or its context completed. This ensures > > that context save is carried out before the last request for a context is > > considered retireable. retire_requests_ring() now uses > > i915_gem_request_retireable() rather than request_complete() when deciding > > which requests to retire. Requests that were not waiting for a context > > switch interrupt (either as a result of being merged into a following > > request or by being a legacy request) will be considered retireable as > > soon as their seqno has passed. > > > > Removed the extra request reference held for the execlist request. > > > > Removed intel_execlists_retire_requests() and all references to > > intel_engine_cs.execlist_retired_req_list. > > > > Moved context unpinning into retire_requests_ring() for now. Further work > > is pending for the context pinning - this patch should allow us to use the > > active list to track context and ring buffer objects later. > > > > Changed gen8_cs_irq_handler() so that notify_ring() is called when > > contexts complete as well as when a user interrupt occurs so that > > notification happens when a request is complete and context save has > > finished. > > > > v2: Rebase over the read-read optimisation changes > > > > v3: Reworked IRQ handler after removing IRQ handler cleanup patch > > > > v4: Fixed various pin leaks > > > > Issue: VIZ-4277 > > Signed-off-by: Thomas Daniel <thomas.daniel@xxxxxxxxx> > > Signed-off-by: Nick Hoath <nicholas.hoath@xxxxxxxxx> > > --- > > drivers/gpu/drm/i915/i915_drv.h | 6 +++ > > drivers/gpu/drm/i915/i915_gem.c | 67 +++++++++++++++++++++------ > > drivers/gpu/drm/i915/i915_irq.c | 81 +++++++++++++++++++++------------ > > drivers/gpu/drm/i915/intel_lrc.c | 43 +++-------------- > > drivers/gpu/drm/i915/intel_lrc.h | 2 +- > > drivers/gpu/drm/i915/intel_ringbuffer.h | 1 - > > 6 files changed, 118 insertions(+), 82 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > > index fbf0ae9..3d217f9 100644 > > --- a/drivers/gpu/drm/i915/i915_drv.h > > +++ b/drivers/gpu/drm/i915/i915_drv.h > > @@ -2262,6 +2262,12 @@ struct drm_i915_gem_request { > > /** Execlists no. of times this request has been sent to the ELSP */ > > int elsp_submitted; > > > > + /** > > + * Execlists: whether this requests's context has completed after > > + * submission to the ELSP > > + */ > > + bool ctx_complete; > > + > > }; > > > > int i915_gem_request_alloc(struct intel_engine_cs *ring, > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > > index 52642af..fc82171 100644 > > --- a/drivers/gpu/drm/i915/i915_gem.c > > +++ b/drivers/gpu/drm/i915/i915_gem.c > > @@ -1386,6 +1386,24 @@ __i915_gem_request_retire__upto(struct drm_i915_gem_request *req) > > typeof(*tmp), list); > > > > i915_gem_request_retire(tmp); > > + > > + if (i915.enable_execlists) { > > + struct intel_context *ctx = tmp->ctx; > > + struct drm_i915_private *dev_priv = > > + engine->dev->dev_private; > > + unsigned long flags; > > + struct drm_i915_gem_object *ctx_obj = > > + ctx->engine[engine->id].state; > > + > > + spin_lock_irqsave(&engine->execlist_lock, flags); > > + > > + if (ctx_obj && (ctx != engine->default_context)) > > + intel_lr_context_unpin(tmp); > > + > > + intel_runtime_pm_put(dev_priv); > > + spin_unlock_irqrestore(&engine->execlist_lock, flags); > > + } > > Why here? Surely you intended this to be called by > i915_gem_request_retire(). The runtime pm interaction is wrong, that is > handled by GPU busyness tracking. But by doing this at this juncture you > now increase the frequency at which we have to recreate the iomapping, > most noticeable on bsw+ and take more spinlocks unnecessarily. > > Also since you no longer do reference tracking for the > execlists_queue, tmp has just been freed. > > > } while (tmp != req); > > > > WARN_ON(i915_verify_lists(engine->dev)); > > @@ -2359,6 +2377,12 @@ void i915_vma_move_to_active(struct i915_vma *vma, > > list_move_tail(&vma->mm_list, &vma->vm->active_list); > > } > > > > +static bool i915_gem_request_retireable(struct drm_i915_gem_request *req) > > +{ > > + return (i915_gem_request_completed(req, true) && > > + (!req->elsp_submitted || req->ctx_complete)); > > I disagree with this completely. A request must only be considered complete > when it's seqno is passed. The context should be tracking the request > and not the other way around (like legacy does). > > The approach is just wrong. The lifecycle of the request is not the > bloat in execlists, and keeping an extra reference on the request struct > itself them until execlists is able to reap them amoritizes the cost of > the spinlock and atomic operations. Hm, that's slightly different approach than what I suggested with just creating special ctx-switch requests (since with the scheduler we'll reorder so can't do the same trick as with legacy ctx switching any more). This trick here otoh extends the lifetime of a request until it's kicked out, so that the shrinker doesn't it the request object too early. How does holding an additional ref on the request prevent this? I mean I don't see how that would prevent special casing in the eviction code ... Or is your plan to put the final waiting for the request the ctx to idle into the ->vma_unbind hook? That seems again a bit a quirk ... -Daniel -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx