On Wed, May 8, 2019 at 2:06 PM Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote: > > Currently there is an underlying assumption that i915_request_unsubmit() > is synchronous wrt the GPU -- that is the request is no longer in flight > as we remove it. In the near future that may change, and this may upset > our signaling as we can process an interrupt for that request while it > is no longer in flight. > > CPU0 CPU1 > intel_engine_breadcrumbs_irq > (queue request completion) > i915_request_cancel_signaling > ... ... > i915_request_enable_signaling > dma_fence_signal > > Hence in the time it took us to drop the lock to signal the request, a > preemption event may have occurred and re-queued the request. In the > process, that request would have seen I915_FENCE_FLAG_SIGNAL clear and > so reused the rq->signal_link that was in use on CPU0, leading to bad > pointer chasing in intel_engine_breadcrumbs_irq. > > A related issue was that if someone started listening for a signal on a > completed but no longer in-flight request, we missed the opportunity to > immediately signal that request. > > Furthermore, as intel_contexts may be immediately released during > request retirement, in order to be entirely sure that > intel_engine_breadcrumbs_irq may no longer dereference the intel_context > (ce->signals and ce->signal_link), we must wait for irq spinlock. > > In order to prevent the race, we use a bit in the fence.flags to signal > the transfer onto the signal list inside intel_engine_breadcrumbs_irq. > For simplicity, we use the DMA_FENCE_FLAG_SIGNALED_BIT as it then > quickly signals to any outside observer that the fence is indeed signaled. > > v2: Sketch out potential dma-fence API for manual signaling > v3: And the test_and_set_bit() > > Fixes: 52c0fdb25c7c ("drm/i915: Replace global breadcrumbs with per-context interrupt tracking") > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> Ok chatted a bit with Chris on irc, and we already allow drivers to pass whatever spinlock is suitable for their request/fence handling/retiring, so allowing to split up this all makes sense I think. Has my ack on the approach. I think it'd be good to spell out the optimization this allows (synchronous cleanup instead of refcounting all. the. things.) plus showcase the link between the fence->lock pointer and the split-up __dma_signal functions in the kerneldoc. Definitely for the core patch. Also not sure why __notify and __timestamp has a double underscore in the middle. That color choice confuses me a bit :-) Cheers, Daniel > --- > drivers/dma-buf/dma-fence.c | 1 + > drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 78 +++++++++++++++------ > drivers/gpu/drm/i915/i915_request.c | 1 + > drivers/gpu/drm/i915/intel_guc_submission.c | 1 - > 4 files changed, 59 insertions(+), 22 deletions(-) > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c > index 3aa8733f832a..9bf06042619a 100644 > --- a/drivers/dma-buf/dma-fence.c > +++ b/drivers/dma-buf/dma-fence.c > @@ -29,6 +29,7 @@ > > EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit); > EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal); > +EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled); > > static DEFINE_SPINLOCK(dma_fence_stub_lock); > static struct dma_fence dma_fence_stub; > diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c > index fe455f01aa65..c092bdf5f0bf 100644 > --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c > +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c > @@ -23,6 +23,7 @@ > */ > > #include <linux/kthread.h> > +#include <trace/events/dma_fence.h> > #include <uapi/linux/sched/types.h> > > #include "i915_drv.h" > @@ -96,9 +97,39 @@ check_signal_order(struct intel_context *ce, struct i915_request *rq) > return true; > } > > +static bool > +__dma_fence_signal(struct dma_fence *fence) > +{ > + return !test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags); > +} > + > +static void > +__dma_fence_signal__timestamp(struct dma_fence *fence, ktime_t timestamp) > +{ > + fence->timestamp = timestamp; > + set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &fence->flags); > + trace_dma_fence_signaled(fence); > +} > + > +static void > +__dma_fence_signal__notify(struct dma_fence *fence) > +{ > + struct dma_fence_cb *cur, *tmp; > + > + lockdep_assert_held(fence->lock); > + lockdep_assert_irqs_disabled(); > + > + list_for_each_entry_safe(cur, tmp, &fence->cb_list, node) { > + INIT_LIST_HEAD(&cur->node); > + cur->func(fence, cur); > + } > + INIT_LIST_HEAD(&fence->cb_list); > +} > + > void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine) > { > struct intel_breadcrumbs *b = &engine->breadcrumbs; > + const ktime_t timestamp = ktime_get(); > struct intel_context *ce, *cn; > struct list_head *pos, *next; > LIST_HEAD(signal); > @@ -122,6 +153,10 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine) > > GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_SIGNAL, > &rq->fence.flags)); > + clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags); > + > + if (!__dma_fence_signal(&rq->fence)) > + continue; > > /* > * Queue for execution after dropping the signaling > @@ -129,14 +164,6 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine) > * more signalers to the same context or engine. > */ > i915_request_get(rq); > - > - /* > - * We may race with direct invocation of > - * dma_fence_signal(), e.g. i915_request_retire(), > - * so we need to acquire our reference to the request > - * before we cancel the breadcrumb. > - */ > - clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags); > list_add_tail(&rq->signal_link, &signal); > } > > @@ -159,7 +186,12 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine) > struct i915_request *rq = > list_entry(pos, typeof(*rq), signal_link); > > - dma_fence_signal(&rq->fence); > + __dma_fence_signal__timestamp(&rq->fence, timestamp); > + > + spin_lock(&rq->lock); > + __dma_fence_signal__notify(&rq->fence); > + spin_unlock(&rq->lock); > + > i915_request_put(rq); > } > } > @@ -261,19 +293,17 @@ void intel_engine_fini_breadcrumbs(struct intel_engine_cs *engine) > > bool i915_request_enable_breadcrumb(struct i915_request *rq) > { > - struct intel_breadcrumbs *b = &rq->engine->breadcrumbs; > - > - GEM_BUG_ON(test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)); > + lockdep_assert_held(&rq->lock); > + lockdep_assert_irqs_disabled(); > > - if (!test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags)) > - return true; > - > - spin_lock(&b->irq_lock); > - if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags) && > - !__request_completed(rq)) { > + if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags)) { > + struct intel_breadcrumbs *b = &rq->engine->breadcrumbs; > struct intel_context *ce = rq->hw_context; > struct list_head *pos; > > + spin_lock(&b->irq_lock); > + GEM_BUG_ON(test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)); > + > __intel_breadcrumbs_arm_irq(b); > > /* > @@ -303,8 +333,8 @@ bool i915_request_enable_breadcrumb(struct i915_request *rq) > GEM_BUG_ON(!check_signal_order(ce, rq)); > > set_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags); > + spin_unlock(&b->irq_lock); > } > - spin_unlock(&b->irq_lock); > > return !__request_completed(rq); > } > @@ -313,9 +343,15 @@ void i915_request_cancel_breadcrumb(struct i915_request *rq) > { > struct intel_breadcrumbs *b = &rq->engine->breadcrumbs; > > - if (!test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) > - return; > + lockdep_assert_held(&rq->lock); > + lockdep_assert_irqs_disabled(); > > + /* > + * We must wait for b->irq_lock so that we know the interrupt handler > + * has released its reference to the intel_context and has completed > + * the DMA_FENCE_FLAG_SIGNALED_BIT/I915_FENCE_FLAG_SIGNAL dance (if > + * required). > + */ > spin_lock(&b->irq_lock); > if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) { > struct intel_context *ce = rq->hw_context; > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > index fa955b7b6def..bed213148cbb 100644 > --- a/drivers/gpu/drm/i915/i915_request.c > +++ b/drivers/gpu/drm/i915/i915_request.c > @@ -437,6 +437,7 @@ void __i915_request_submit(struct i915_request *request) > set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags); > > if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) && > + !test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &request->fence.flags) && > !i915_request_enable_breadcrumb(request)) > intel_engine_queue_breadcrumbs(engine); > > diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c > index 380d83a2bfb6..ea0e3734d37c 100644 > --- a/drivers/gpu/drm/i915/intel_guc_submission.c > +++ b/drivers/gpu/drm/i915/intel_guc_submission.c > @@ -23,7 +23,6 @@ > */ > > #include <linux/circ_buf.h> > -#include <trace/events/dma_fence.h> > > #include "gt/intel_engine_pm.h" > #include "gt/intel_lrc_reg.h" > -- > 2.20.1 > > _______________________________________________ > dri-devel mailing list > dri-devel@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/dri-devel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel