Quoting Tvrtko Ursulin (2019-04-25 11:35:01) > > On 25/04/2019 10:19, Chris Wilson wrote: > > Currently there is an underlying assumption that i915_request_unsubmit() > > is synchronous wrt the GPU -- that is the request is no longer in flight > > as we remove it. In the near future that may change, and this may upset > > our signaling as we can process an interrupt for that request while it > > is no longer in flight. > > > > CPU0 CPU1 > > intel_engine_breadcrumbs_irq > > (queue request completion) > > i915_request_cancel_signaling > > ... ... > > i915_request_enable_signaling > > dma_fence_signal > > > > Hence in the time it took us to drop the lock to signal the request, a > > preemption event may have occurred and re-queued the request. In the > > process, that request would have seen I915_FENCE_FLAG_SIGNAL clear and > > so reused the rq->signal_link that was in use on CPU0, leading to bad > > pointer chasing in intel_engine_breadcrumbs_irq. > > > > A related issue was that if someone started listening for a signal on a > > completed but no longer in-flight request, we missed the opportunity to > > immediately signal that request. > > > > Furthermore, as intel_contexts may be immediately released during > > request retirement, in order to be entirely sure that > > intel_engine_breadcrumbs_irq may no longer dereference the intel_context > > (ce->signals and ce->signal_link), we must wait for irq spinlock. > > > > In order to prevent the race, we use a bit in the fence.flags to signal > > the transfer onto the signal list inside intel_engine_breadcrumbs_irq. > > For simplicity, we use the DMA_FENCE_FLAG_SIGNALED_BIT as it then > > quickly signals to any outside observer that the fence is indeed signaled. > > > > Fixes: 52c0fdb25c7c ("drm/i915: Replace global breadcrumbs with per-context interrupt tracking") > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> > > --- > > drivers/dma-buf/dma-fence.c | 1 + > > drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 58 +++++++++++++-------- > > drivers/gpu/drm/i915/i915_request.c | 1 + > > 3 files changed, 39 insertions(+), 21 deletions(-) > > > > diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c > > index 3aa8733f832a..9bf06042619a 100644 > > --- a/drivers/dma-buf/dma-fence.c > > +++ b/drivers/dma-buf/dma-fence.c > > @@ -29,6 +29,7 @@ > > > > EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit); > > EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal); > > +EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled); > > > > static DEFINE_SPINLOCK(dma_fence_stub_lock); > > static struct dma_fence dma_fence_stub; > > diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c > > index 3cbffd400b1b..4283224249d4 100644 > > --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c > > +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c > > @@ -23,6 +23,7 @@ > > */ > > > > #include <linux/kthread.h> > > +#include <trace/events/dma_fence.h> > > #include <uapi/linux/sched/types.h> > > > > #include "i915_drv.h" > > @@ -83,6 +84,7 @@ static inline bool __request_completed(const struct i915_request *rq) > > void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine) > > { > > struct intel_breadcrumbs *b = &engine->breadcrumbs; > > + const ktime_t timestamp = ktime_get(); > > struct intel_context *ce, *cn; > > struct list_head *pos, *next; > > LIST_HEAD(signal); > > @@ -104,6 +106,11 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine) > > > > GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_SIGNAL, > > &rq->fence.flags)); > > + clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags); > > + > > + if (test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, > > + &rq->fence.flags)) > > + continue; > > > > /* > > * Queue for execution after dropping the signaling > > @@ -111,14 +118,6 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine) > > * more signalers to the same context or engine. > > */ > > i915_request_get(rq); > > - > > - /* > > - * We may race with direct invocation of > > - * dma_fence_signal(), e.g. i915_request_retire(), > > - * so we need to acquire our reference to the request > > - * before we cancel the breadcrumb. > > - */ > > - clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags); > > list_add_tail(&rq->signal_link, &signal); > > } > > > > @@ -140,8 +139,21 @@ void intel_engine_breadcrumbs_irq(struct intel_engine_cs *engine) > > list_for_each_safe(pos, next, &signal) { > > struct i915_request *rq = > > list_entry(pos, typeof(*rq), signal_link); > > + struct dma_fence_cb *cur, *tmp; > > + > > + trace_dma_fence_signaled(&rq->fence); > > + > > + rq->fence.timestamp = timestamp; > > + set_bit(DMA_FENCE_FLAG_TIMESTAMP_BIT, &rq->fence.flags); > > + > > + spin_lock(&rq->lock); > > + list_for_each_entry_safe(cur, tmp, &rq->fence.cb_list, node) { > > + INIT_LIST_HEAD(&cur->node); > > + cur->func(&rq->fence, cur); > > + } > > + INIT_LIST_HEAD(&rq->fence.cb_list); > > + spin_unlock(&rq->lock); > > > > - dma_fence_signal(&rq->fence); > > I posted some comments on this patch already. In essence it was a > suggestion to not open-code-and-optimize dma_fence_signal, but split it > into two low-level helpers and export from the parent location. Like > __dma_fence_maybe/start_signal and __dma_fence_finish_signal. > > bool __dma_fence_start_signal(...) > { > return test_and_set_bit(DMA_FENCE_FLAG_SIGNALED_BIT, > fence->flags); > } > EXPORT_SYMBOL(...) > > void __dma_fence_finish_signal(...) > { > trace_dma_fence_signaled... > timestamp... > callbacks... > } > EXPORT_SYMBOL(...) > > This way we don't add coupling to low level implementation details in i915. You mean midlayer implementation details :-p The alternative I mentioned was that we use a redundant bit. But that has duplicity of purpose. The patch needs to be stable-friendly for the fixes, so I opted for taking control. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx