On Mon, Dec 07, 2020 at 07:38:10PM +0000, Chris Wilson wrote: > The issue with stale virtual breadcrumbs remain. Now we have the problem > that if the irq-signaler is still referencing the stale breadcrumb as we > transfer it to a new sibling, the list becomes spaghetti. This is a very > small window, but that doesn't stop it being hit infrequently. To > prevent the lists being tangled (the iterator starting on one engine's > b->signalers but walking onto another list), always decouple the virtual > breadcrumb on schedule-out and make sure that the walker has stepped out > of the lists. > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> Makes sense to me. Reviewed-by: Matthew Brost <matthew.brost@xxxxxxxxx> > --- > drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 5 +++-- > drivers/gpu/drm/i915/gt/intel_lrc.c | 15 +++++++++++++++ > 2 files changed, 18 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c > index 00918300f53f..63900edbde88 100644 > --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c > +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c > @@ -454,15 +454,16 @@ void i915_request_cancel_breadcrumb(struct i915_request *rq) > { > struct intel_breadcrumbs *b = READ_ONCE(rq->engine)->breadcrumbs; > struct intel_context *ce = rq->context; > + unsigned long flags; > bool release; > > if (!test_and_clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) > return; > > - spin_lock(&ce->signal_lock); > + spin_lock_irqsave(&ce->signal_lock, flags); > list_del_rcu(&rq->signal_link); > release = remove_signaling_context(b, ce); > - spin_unlock(&ce->signal_lock); > + spin_unlock_irqrestore(&ce->signal_lock, flags); > if (release) > intel_context_put(ce); > > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c > index 0b7e86240f3b..b3db16b2a5a4 100644 > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c > @@ -1388,6 +1388,21 @@ static inline void execlists_schedule_in(struct i915_request *rq, int idx) > static void kick_siblings(struct i915_request *rq, struct intel_context *ce) > { > struct virtual_engine *ve = container_of(ce, typeof(*ve), context); > + struct intel_engine_cs *engine = rq->engine; > + > + /* Flush concurrent rcu iterators in signal_irq_work */ > + if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags)) { > + /* > + * After this point, the rq may be transferred to a new > + * sibling, so before we clear ce->inflight make sure that > + * the context has been removed from the b->signalers and > + * furthermore we need to make sure that the concurrent > + * iterator in signal_irq_work is no longer following > + * ce->signal_link. > + */ > + i915_request_cancel_breadcrumb(rq); > + irq_work_sync(&engine->breadcrumbs->irq_work); > + } > > if (READ_ONCE(ve->request)) > tasklet_hi_schedule(&ve->base.execlists.tasklet); > -- > 2.20.1 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/intel-gfx _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx