Re: [PATCH 08/66] drm/i915: Make the stale cached active node available for any timeline

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Wed, 29 Jul 2020 15:22:26 +0100

On 29/07/2020 14:42, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2020-07-29 13:40:38)

On 28/07/2020 15:28, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2020-07-17 14:04:58)

On 15/07/2020 12:50, Chris Wilson wrote:
Rather than require the next timeline after idling to match the MRU
before idling, reset the index on the node and allow it to match the
first request. However, this requires cmpxchg(u64) and so is not trivial
on 32b, so for compatibility we just fallback to keeping the cached node
pointing to the MRU timeline.

Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
---
    drivers/gpu/drm/i915/i915_active.c | 21 +++++++++++++++++++--
    1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index 0854b1552bc1..6737b5615c0c 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -157,6 +157,10 @@ __active_retire(struct i915_active *ref)
                rb_link_node(&ref->cache->node, NULL, &ref->tree.rb_node);
                rb_insert_color(&ref->cache->node, &ref->tree);
                GEM_BUG_ON(ref->tree.rb_node != &ref->cache->node);
+
+             /* Make the cached node available for reuse with any timeline */
+             if (IS_ENABLED(CONFIG_64BIT))
+                     ref->cache->timeline = 0; /* needs cmpxchg(u64) */

Or when fence context wraps shock horror.

I more concerned about that we use timeline:0 as a special unordered
timeline. It's reserved by use in the dma_fence_stub, and everything
will start to break when the timelines wrap. The earliest causalities
will be the kernel_context timelines which are also very special indices
for the barriers.


        }
    
        spin_unlock_irqrestore(&ref->tree_lock, flags);
@@ -235,9 +239,22 @@ static struct active_node *__active_lookup(struct i915_active *ref, u64 idx)
    {
        struct active_node *it;
    
+     GEM_BUG_ON(idx == 0); /* 0 is the unordered timeline, rsvd for cache */
+
        it = READ_ONCE(ref->cache);
-     if (it && it->timeline == idx)
-             return it;
+     if (it) {
+             u64 cached = READ_ONCE(it->timeline);
+
+             if (cached == idx)
+                     return it;
+
+#ifdef CONFIG_64BIT /* for cmpxchg(u64) */
+             if (!cached && !cmpxchg(&it->timeline, 0, idx)) {
+                     GEM_BUG_ON(i915_active_fence_isset(&it->base));
+                     return it;

cpmxchg suggests this needs to be atomic, however above the check for
equality comes from a separate read.

That's fine, and quite common to avoid cmpxchg if the current value
already does not match the expected condition.

How? What is another thread is about to install its idx into
it->timeline with cmpxchg and this thread does not see it because it
just returned on the "cached == idx" condition.

Because it's nonzero.

If the idx is already non-zero, it will always remain non-zero until
everybody idles (and there are no more threads).

If the idx is zero, it can only transition to non-zero once, atomically
via cmpxchg. The first and only first cmpxchg will return that the
previous value was 0, and so return with it->idx == idx.

I think this is worthy of a comment to avoid future reader having to 
re-figure it all out.

Since there is a lookup code path under the spinlock, perhaps the
unlocked lookup could just fail, and then locked lookup could re-assign
the timeline without the need for cmpxchg?

The unlocked/locked lookup are the same routine. You pointed that out
:-p

Like I remember from ten days ago.. Anyway, I am pointing out it still
doesn't smell right.

__active_lookup(...) -> lockless
{
...
         it = fetch_node(ref->tree.rb_node);
         while (it) {
                 if (it->timeline < idx) {
                         it = fetch_node(it->node.rb_right);
                 } else if (it->timeline > idx) {
                         it = fetch_node(it->node.rb_left);
                 } else {
                         WRITE_ONCE(ref->cache, it);
                         break;
                 }
         }
...
}

Then in active_instance, locked:

...
         parent = NULL;
         p = &ref->tree.rb_node;
         while (*p) {
                 parent = *p;

                 node = rb_entry(parent, struct active_node, node);
                 if (node->timeline == idx) {
                         kmem_cache_free(global.slab_cache, prealloc);
                         goto out;
                 }

                 if (node->timeline < idx)
                         p = &parent->rb_right;
                 else
                         p = &parent->rb_left;
                         WRITE_ONCE(ref->cache, it);
                         break;
                 }
         }
...

Tree walk could be consolidated between the two.

This tree walk is subtly different, as we aren't just interested in the
node, but its parent. The exact repetitions have been consolidated into
__active_lookup.

It returns the previous/parent node if idx is not found so yeah, common 
helper would need to have two out parameters. One returns the match, or 
NULL, another returns the previous/parent node. You think that is not 
worth it?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx