On 20/01/2020 20:32, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2020-01-20 19:47:08)
On 20/01/2020 17:57, Chris Wilson wrote:
Keep the rq->fence.flags consistent with the status of the
rq->sched.link, and clear the associated bits when decoupling the link
on retirement (as we may wish to inspect those flags independent of
other state).
Fixes: 32ff621fd744 ("drm/i915/gt: Allow temporary suspension of inflight requests")
References: https://gitlab.freedesktop.org/drm/intel/issues/997
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
---
drivers/gpu/drm/i915/i915_request.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 9ed0d3bc7249..78a5f5d3c070 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -221,6 +221,8 @@ static void remove_from_engine(struct i915_request *rq)
locked = engine;
}
list_del_init(&rq->sched.link);
+ clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
This one I think can not be set in retirement. Or there is a path?
[comes back after writing the comment below]
Race between completion to hold puts the request on hold, then request
completes just as it is un-held? It needs retire to happen at the right
time, driven by ...? Is this it?
+ clear_bit(I915_FENCE_FLAG_HOLD, &rq->fence.flags);
This one I think indeed can race with completion.
Fwiw, this appears to be the active part of the trace
<0>[ 56.132334] <idle>-0 1.Ns1 52042407us : execlists_reset_finish: 0000:00:02.0 vcs1: depth->1
<0>[ 56.132415] rs:main -428 0..s. 52042429us : process_csb: 0000:00:02.0 vcs1: cs-irq head=5, tail=1
<0>[ 56.132501] rs:main -428 0..s. 52042432us : process_csb: 0000:00:02.0 vcs1: csb[0]: status=0x00000001:0x00000000
<0>[ 56.132591] rs:main -428 0..s. 52042437us : trace_ports: 0000:00:02.0 vcs1: promote { b:6!, 0:0 }
<0>[ 56.132676] rs:main -428 0..s. 52042442us : process_csb: 0000:00:02.0 vcs1: csb[1]: status=0x00000818:0x00000020
<0>[ 56.132766] rs:main -428 0..s. 52042445us : trace_ports: 0000:00:02.0 vcs1: completed { b:6!, 0:0 }
<0>[ 56.132860] kworker/-12 0.... 52042771us : i915_request_retire: 0000:00:02.0 vcs1: fence b:6, current 6
<0>[ 56.132949] kworker/-65 1d..1 52044886us : execlists_unhold: 0000:00:02.0 vcs0: fence 27:2, current 2 hold release
<0>[ 56.133041] ksoftirq-16 1..s. 52044912us : process_csb: 0000:00:02.0 vcs0: cs-irq head=2, tail=2
<0>[ 56.133118] ksoftirq-16 1d.s1 52044916us : __i915_request_submit: 0000:00:02.0 vcs0: fence 27:2, current 2
<0>[ 56.133216] kworker/-65 1.... 52044946us : i915_request_retire: 0000:00:02.0 vcs0: fence 9:14, current 14
<0>[ 56.133314] kworker/-65 1d... 52044986us : __i915_request_commit: 0000:00:02.0 vcs0: fence 9:16, current 14
<0>[ 56.133402] kworker/-65 1d... 52044993us : __engine_park: 0000:00:02.0 vcs0:
<0>[ 56.133490] kworker/-65 1d..2 52045032us : __i915_request_submit: 0000:00:02.0 vcs0: fence 9:16, current 14
<0>[ 56.133581] kworker/-65 1d..2 52045892us : trace_ports: 0000:00:02.0 vcs0: submit { 9:16, 0:0 }
<0>[ 56.133664] <idle>-0 0dNH2 52045932us : __intel_context_retire: 0000:00:02.0 vcs0: context:27 retire
<0>[ 56.133751] <idle>-0 0.Ns1 52045949us : process_csb: 0000:00:02.0 vcs0: cs-irq head=2, tail=4
<0>[ 56.133838] <idle>-0 0.Ns1 52045950us : process_csb: 0000:00:02.0 vcs0: csb[3]: status=0x00000001:0x00000000
<0>[ 56.133927] <idle>-0 0.Ns1 52045951us : trace_ports: 0000:00:02.0 vcs0: promote { 9:16!, 0:0 }
<0>[ 56.134013] <idle>-0 0.Ns1 52045951us : process_csb: 0000:00:02.0 vcs0: csb[4]: status=0x00000818:0x00000060
<0>[ 56.134102] <idle>-0 0.Ns1 52045952us : trace_ports: 0000:00:02.0 vcs0: completed { 9:16!, 0:0 }
<0>[ 56.134198] kworker/-12 0.... 52045960us : i915_request_retire: 0000:00:02.0 vcs0: fence 9:16, current 16
<0>[ 56.134285] kworker/-12 0.... 52045962us : __engine_park: 0000:00:02.0 vcs0:
<0>[ 56.134361] kworker/-65 1d..1 52046503us : execlists_unhold: 0000:00:02.0 vcs1: fence 2a:2, current 1 hold release
<0>[ 56.134427] ksoftirq-16 1..s. 52046510us : process_csb: 0000:00:02.0 vcs1: cs-irq head=1, tail=1
It doesn't look like there's overlap between the hold and retire. :|
Which bit is the race? I coudln't spot the same request being mentioned
on two separate paths.
I mean I have no issues with the patch. Just trying to understand the
possible races and whether or not there should be an assert for PQUEUE
instead of clearing it on retire.
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx