Re: [PATCH] drm/i915/gt: Do not add same i915_request to intel_context twice

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 13/12/2021 01:53, Yang, Dong wrote:
I am working on a customized kernel based on 5.4.39,  issue can only reproduced when system facing low memory pressure, and system try to reclaim memory, then wrong double insert i915_reqeust coming  from the i915_gem_shrink() path.

5.4 is quite old and there have been fixes to this code since. Any chance that you can repro on drm-tip? What project are you working on?

Is your bug perhaps similar to what c744d50363b7 ("drm/i915/gt: Split the breadcrumb spinlock between global and contexts") fixed? As the commit says:

"""
 Furthermore, this closes the race between enabling the signaling context
 while it is in the process of being signaled and removed:
"""


i915_request_enable_breadcrumb+0x136/0x14a
dma_fence_enable_sw_signaling+0x47/0xb0
enable_signaling+0x66/0x80
i915_active_wait+0xc1/0x150
__i915_vma_unbind+0x17/0x1a0
i915_vma_unbind+0x47/0xc0
i915_gem_object_unbind+0x189/0x290
i915_gem_shrink+0x139/0x460
? __pm_runtime_resume+0x53/0x70
i915_gem_shrinker_scan+0x9c/0xb0
do_shrink_slab+0x14f/0x2b0
shrink_slab+0xa7/0x2a0
shrink_node+0xd1/0x410
balance_pgdat+0x2b7/0x500
kswapd+0x1e2/0x3b0

I believe it's not related to the ce->signal_lock,  the lock should works normally.

The i915_request_enable_breadcrumb() can be invoked by several context, like called from ioctl(), from interrupt context, and from memory swap thread, I suggest add a double check before insert i915_request to the list, it's hard to assure valid call from all the paths, but add check&protect can avoid the critical effect,  because add same i915_request twice will trigger a dead loop in signal_irq_work() , and the loop will never break continue the i915_request. hwsp_seqno be changed, and invalid address access error reported followed by system panic.

Maybe, but I was pointing out double insert_breadcrumb is already protected when called inside i915_request_enable_breadcrumb - by the virtue of the spinlock and I915_FENCE_FLAG_SIGNAL. So maybe a race with remove or something, but it looks unlikely it is simple double add due parallel enablement.

Regards,

Tvrtko


Thanks,
Dong

-----Original Message-----
From: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx>
Sent: Friday, December 10, 2021 4:51 PM
To: Yang, Dong <dong.yang@xxxxxxxxx>; intel-gfx@xxxxxxxxxxxxxxxxxxxxx
Subject: Re:  [PATCH] drm/i915/gt: Do not add same i915_request to intel_context twice


On 10/12/2021 01:31, dong.yang@xxxxxxxxx wrote:
From: "Yang, Dong" <dong.yang@xxxxxxxxx>

With unknow race condition, the i915_request will be added

What do you mean with unknown here?

to intel_context list twice, and result in system panic.

If node alreay exist then do not add it again.

Note the call chains are under ce->signal_lock and protecting from double add AFAICT:

static void insert_breadcrumb(struct i915_request *rq) { ...
	if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags))
		return;
...
	set_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);


bool i915_request_enable_breadcrumb(struct i915_request *rq) { ...
	spin_lock(&ce->signal_lock);
	if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags))
		insert_breadcrumb(rq);
	spin_unlock(&ce->signal_lock);


void i915_request_cancel_breadcrumb(struct i915_request *rq) { ...
	spin_lock(&ce->signal_lock);
	if (!test_and_clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) {
		spin_unlock(&ce->signal_lock);
		return;
	}

void intel_context_remove_breadcrumbs(struct intel_context *ce,
				      struct intel_breadcrumbs *b)
{
...
	spin_lock_irqsave(&ce->signal_lock, flags);

	if (list_empty(&ce->signals))
		goto unlock;

	list_for_each_entry_safe(rq, rn, &ce->signals, signal_link) {
		GEM_BUG_ON(!__i915_request_is_complete(rq));
		if (!test_and_clear_bit(I915_FENCE_FLAG_SIGNAL,
					&rq->fence.flags))
			continue;

The last one in signal_irq_work is guarded by the __i915_request_is_complete check.

So I think more context is needed on how you found this may be an issue.

Regards,

Tvrtko


Signed-off-by: Yang, Dong <dong.yang@xxxxxxxxx>
---
   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 3 +++
   1 file changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 209cf265bf74..9c7bc060d2ae 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -387,6 +387,9 @@ static void insert_breadcrumb(struct i915_request *rq)
   		}
   	}
+ if (&rq->signal_link == pos)
+		return;
+
   	i915_request_get(rq);
   	list_add_rcu(&rq->signal_link, pos);
   	GEM_BUG_ON(!check_signal_order(ce, rq));




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux