On 11/12/2017 15:29, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2017-12-11 15:24:06)
On 11/12/2017 14:26, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2017-12-11 14:19:47)
On 08/12/2017 12:10, Chris Wilson wrote:
The intent here was that we would be listening to
i915_gem_request_unsubmit in order to cancel the signaler quickly and
release the reference on the request. Cancelling the signaler is done
directly via intel_engine_cancel_signaling (called from unsubmit), but
that does not directly wake up the signaling thread, and neither does
setting the request->global_seqno back to zero wake up listeners to the
request->execute waitqueue. So the only time that listening to the
request->execute waitqueue would wake up the signaling kthread would be
on the request resubmission, during which time we would already receive
wake ups from rejoining the global breadcrumbs wait rbtree.
Trying to wake up to release the request remains an issue. If the
signaling was cancelled and no other request required signaling, then it
is possible for us to shutdown with the reference on the request still
held. To ensure that we do not try to shutdown, leaking that request, we
kick the signaling threads whenever we disarm the breadcrumbs, i.e. on
parking the engine when idle.
Fixes: d6a2289d9d6b ("drm/i915: Remove the preempted request from the execution queue")
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx>
Cc: Michał Winiarski <michal.winiarski@xxxxxxxxx>
---
drivers/gpu/drm/i915/intel_breadcrumbs.c | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index 5ae2d276f7f3..b826725989b1 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -216,7 +216,7 @@ void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
struct intel_wait *wait, *n, *first;
if (!b->irq_armed)
- return;
+ goto wakeup_signaler;
/* We only disarm the irq when we are idle (all requests completed),
* so if the bottom-half remains asleep, it missed the request
@@ -239,6 +239,14 @@ void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
b->waiters = RB_ROOT;
spin_unlock_irq(&b->rb_lock);
+
+ /*
+ * The signaling thread may be asleep holding a reference to a request,
+ * that had its signaling cancelled prior to being preempted. We need
+ * to kick the signaler, just in case, to release any such reference.
+ */
+wakeup_signaler:
+ wake_up_process(b->signaler);
One thing I was thinking about from our conversation, is whether we can
rely on the kthread_stop() kick for the final ref release? Whether it's
even a good idea to let the reference live for that long?
Probably not, feels awkward. In that sense previous version feels better.
Ok.
Basically it will mean replacing the GEM_BUG_ON(request) there with
another unref. But it will remove the complication and assoicated
commentary here.
Hm, and is there a test which was able to hit the GEM_BUG_ON? It needed
to go in the previous version as well, no?
It does indeed need to be there today, and with the v1 patch. Since
there's no serialisation to allow the previous wakeup to complete and
schedule() again before we hit the kthread_should_stop.
Right, it shouldn't be a GEM_BUG_ON for an expected code path but
replaced with i915_gem_request_put?
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx