Re: [PATCH v2 7/8] drm/i915: Grab execlist spinlock to avoid post-reset concurrency issues.

Dave Gordon <david.s.gordon@xxxxxxxxx> · Fri, 23 Oct 2015 13:49:57 +0100

On 23/10/15 12:02, Tomas Elf wrote:
On 23/10/2015 09:59, Daniel Vetter wrote:
On Fri, Oct 23, 2015 at 09:42:27AM +0100, Tvrtko Ursulin wrote:

On 19/10/15 16:32, Tomas Elf wrote:
Grab execlist lock when cleaning up execlist queues after GPU reset
to avoid
concurrency problems between the context event interrupt handler and
the reset
path immediately following a GPU reset.

* v2 (Chris Wilson):
Do execlist check and use simpler form of spinlock functions.

Signed-off-by: Tomas Elf <tomas.elf@xxxxxxxxx>
---
  drivers/gpu/drm/i915/i915_gem.c | 23 ++++++++++++++---------
  1 file changed, 14 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c
b/drivers/gpu/drm/i915/i915_gem.c
index e57061a..2c7a0b7 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2753,18 +2753,23 @@ static void
i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
       * are the ones that keep the context and ringbuffer backing
objects
       * pinned in place.
       */
-    while (!list_empty(&ring->execlist_queue)) {
-        struct drm_i915_gem_request *submit_req;

-        submit_req = list_first_entry(&ring->execlist_queue,
-                struct drm_i915_gem_request,
-                execlist_link);
-        list_del(&submit_req->execlist_link);
+    if (i915.enable_execlists) {
+        spin_lock_irq(&ring->execlist_lock);
+        while (!list_empty(&ring->execlist_queue)) {
+            struct drm_i915_gem_request *submit_req;

-        if (submit_req->ctx != ring->default_context)
-            intel_lr_context_unpin(submit_req);
+            submit_req = list_first_entry(&ring->execlist_queue,
+                    struct drm_i915_gem_request,
+                    execlist_link);
+            list_del(&submit_req->execlist_link);

-        i915_gem_request_unreference(submit_req);
+            if (submit_req->ctx != ring->default_context)
+                intel_lr_context_unpin(submit_req);
+
+            i915_gem_request_unreference(submit_req);
+        }
+        spin_unlock_irq(&ring->execlist_lock);

Can't this, in theory at least, end up calling
drm_gem_object_unreference in
a couple of code paths, which can sleep, while holding a spinlock?

If this cannot happen in practice for some reason it would probably
be good
to put a comment explaining it.

Or I am missing something?

It can indeed I think. Tvrtko, since I'm off to KS next week and you have
push access, can you pls push the revert if this checks out?

I've discussed it with Tvrtko and one solution that seems reasonable is
to hold the execlist lock while traversing the execlist queue and moving
the elements over to a temporary list. After that we can drop the
execlist lock, traverse the temporary list and unreference the elements
while not holding lock. That way we can get rid of the element in a
synchronized fashion and also unreference without holding the lock. I
believe this pattern is used elsewhere (such as during request retirement)

Does that sound like a plan or does anyone have a problem with that
solution?

Thanks,
Tomas

I think the original is safe, because the execlist queue /shouldn't/ be 
the last reference. But that could change, and in any case in a possibly 
broken state such as may have caused the hang we probably shouldn't rely 
on it. So I think the staged-dequeue-and-release is a better solution :)

.Dave.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx