On Thu, Nov 03, 2016 at 10:34:26AM +0000, Tvrtko Ursulin wrote: > > On 02/11/2016 17:50, Chris Wilson wrote: > >Defer the transfer from the client's timeline onto the execution > >timeline from the point of readiness to the point of actual submission. > >For example, in execlists, a request is finally submitted to hardware > >when the hardware is ready, and only put onto the hardware queue when > >the request is ready. By deferring the transfer, we ensure that the > >timeline is maintained in retirement order if we decide to queue the > >requests onto the hardware in a different order than fifo. > > > >Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > >--- > > drivers/gpu/drm/i915/i915_gem_request.c | 36 ++++++++++++++++++------------ > > drivers/gpu/drm/i915/i915_gem_request.h | 2 ++ > > drivers/gpu/drm/i915/i915_guc_submission.c | 2 ++ > > drivers/gpu/drm/i915/intel_lrc.c | 5 +++++ > > drivers/gpu/drm/i915/intel_ringbuffer.c | 2 ++ > > 5 files changed, 33 insertions(+), 14 deletions(-) > > > >diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c > >index 1ae5a2f8953f..05544dec5de9 100644 > >--- a/drivers/gpu/drm/i915/i915_gem_request.c > >+++ b/drivers/gpu/drm/i915/i915_gem_request.c > >@@ -307,25 +307,16 @@ static u32 timeline_get_seqno(struct i915_gem_timeline *tl) > > return atomic_inc_return(&tl->next_seqno); > > } > > > >-static int __i915_sw_fence_call > >-submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) > >+void __i915_gem_request_submit(struct drm_i915_gem_request *request) > > { > >- struct drm_i915_gem_request *request = > >- container_of(fence, typeof(*request), submit); > > struct intel_engine_cs *engine = request->engine; > > struct intel_timeline *timeline; > >- unsigned long flags; > > u32 seqno; > > > >- if (state != FENCE_COMPLETE) > >- return NOTIFY_DONE; > >- > > /* Transfer from per-context onto the global per-engine timeline */ > > timeline = engine->timeline; > > GEM_BUG_ON(timeline == request->timeline); > >- > >- /* Will be called from irq-context when using foreign DMA fences */ > >- spin_lock_irqsave(&timeline->lock, flags); > >+ assert_spin_locked(&timeline->lock); > > > > seqno = timeline_get_seqno(timeline->common); > > GEM_BUG_ON(!seqno); > >@@ -345,15 +336,32 @@ submit_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state) > > GEM_BUG_ON(!request->global_seqno); > > engine->emit_breadcrumb(request, > > request->ring->vaddr + request->postfix); > >- engine->submit_request(request); > > > >- spin_lock_nested(&request->timeline->lock, SINGLE_DEPTH_NESTING); > >+ /* nested from submit_notify */ > >+ spin_lock_nested(&request->timeline->lock, 2); > > Is this because lockdep does not differentiate between > request->timeline->lock and engine->timeline->lock? Yes. Worse is that the 2 comes from having different paths to this point with their own nesting pattern. struct timeline { struct lock_class_key lockdep; struct mutex mutex; } __mutex_init(&timeline->mutex, "timeline", &timeline->lockdep); ? (Hopefully that works w/o lockdep enabled) Better then would be i915_gem_timeline_create() i915_gem_timeline_create_global() and just use one lock class for all user timelines, and a lock class per engine on the global timeline. Will try. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx