I am working on a customized kernel based on 5.4.39, issue can only reproduced when system facing low memory pressure, and system try to reclaim memory, then wrong double insert i915_reqeust coming from the i915_gem_shrink() path. i915_request_enable_breadcrumb+0x136/0x14a dma_fence_enable_sw_signaling+0x47/0xb0 enable_signaling+0x66/0x80 i915_active_wait+0xc1/0x150 __i915_vma_unbind+0x17/0x1a0 i915_vma_unbind+0x47/0xc0 i915_gem_object_unbind+0x189/0x290 i915_gem_shrink+0x139/0x460 ? __pm_runtime_resume+0x53/0x70 i915_gem_shrinker_scan+0x9c/0xb0 do_shrink_slab+0x14f/0x2b0 shrink_slab+0xa7/0x2a0 shrink_node+0xd1/0x410 balance_pgdat+0x2b7/0x500 kswapd+0x1e2/0x3b0 I believe it's not related to the ce->signal_lock, the lock should works normally. The i915_request_enable_breadcrumb() can be invoked by several context, like called from ioctl(), from interrupt context, and from memory swap thread, I suggest add a double check before insert i915_request to the list, it's hard to assure valid call from all the paths, but add check&protect can avoid the critical effect, because add same i915_request twice will trigger a dead loop in signal_irq_work() , and the loop will never break continue the i915_request. hwsp_seqno be changed, and invalid address access error reported followed by system panic. Thanks, Dong -----Original Message----- From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> Sent: Friday, December 10, 2021 4:51 PM To: Yang, Dong <dong.yang@xxxxxxxxx>; intel-gfx@xxxxxxxxxxxxxxxxxxxxx Subject: Re: [PATCH] drm/i915/gt: Do not add same i915_request to intel_context twice On 10/12/2021 01:31, dong.yang@xxxxxxxxx wrote: > From: "Yang, Dong" <dong.yang@xxxxxxxxx> > > With unknow race condition, the i915_request will be added What do you mean with unknown here? > to intel_context list twice, and result in system panic. > > If node alreay exist then do not add it again. Note the call chains are under ce->signal_lock and protecting from double add AFAICT: static void insert_breadcrumb(struct i915_request *rq) { ... if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) return; ... set_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags); bool i915_request_enable_breadcrumb(struct i915_request *rq) { ... spin_lock(&ce->signal_lock); if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags)) insert_breadcrumb(rq); spin_unlock(&ce->signal_lock); void i915_request_cancel_breadcrumb(struct i915_request *rq) { ... spin_lock(&ce->signal_lock); if (!test_and_clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) { spin_unlock(&ce->signal_lock); return; } void intel_context_remove_breadcrumbs(struct intel_context *ce, struct intel_breadcrumbs *b) { ... spin_lock_irqsave(&ce->signal_lock, flags); if (list_empty(&ce->signals)) goto unlock; list_for_each_entry_safe(rq, rn, &ce->signals, signal_link) { GEM_BUG_ON(!__i915_request_is_complete(rq)); if (!test_and_clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) continue; The last one in signal_irq_work is guarded by the __i915_request_is_complete check. So I think more context is needed on how you found this may be an issue. Regards, Tvrtko > > Signed-off-by: Yang, Dong <dong.yang@xxxxxxxxx> > --- > drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c > b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c > index 209cf265bf74..9c7bc060d2ae 100644 > --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c > +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c > @@ -387,6 +387,9 @@ static void insert_breadcrumb(struct i915_request *rq) > } > } > > + if (&rq->signal_link == pos) > + return; > + > i915_request_get(rq); > list_add_rcu(&rq->signal_link, pos); > GEM_BUG_ON(!check_signal_order(ce, rq)); >