On 28/07/2020 15:28, Chris Wilson wrote:
Quoting Tvrtko Ursulin (2020-07-17 14:04:58)
On 15/07/2020 12:50, Chris Wilson wrote:
Rather than require the next timeline after idling to match the MRU
before idling, reset the index on the node and allow it to match the
first request. However, this requires cmpxchg(u64) and so is not trivial
on 32b, so for compatibility we just fallback to keeping the cached node
pointing to the MRU timeline.
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
---
drivers/gpu/drm/i915/i915_active.c | 21 +++++++++++++++++++--
1 file changed, 19 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index 0854b1552bc1..6737b5615c0c 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -157,6 +157,10 @@ __active_retire(struct i915_active *ref)
rb_link_node(&ref->cache->node, NULL, &ref->tree.rb_node);
rb_insert_color(&ref->cache->node, &ref->tree);
GEM_BUG_ON(ref->tree.rb_node != &ref->cache->node);
+
+ /* Make the cached node available for reuse with any timeline */
+ if (IS_ENABLED(CONFIG_64BIT))
+ ref->cache->timeline = 0; /* needs cmpxchg(u64) */
Or when fence context wraps shock horror.
I more concerned about that we use timeline:0 as a special unordered
timeline. It's reserved by use in the dma_fence_stub, and everything
will start to break when the timelines wrap. The earliest causalities
will be the kernel_context timelines which are also very special indices
for the barriers.
}
spin_unlock_irqrestore(&ref->tree_lock, flags);
@@ -235,9 +239,22 @@ static struct active_node *__active_lookup(struct i915_active *ref, u64 idx)
{
struct active_node *it;
+ GEM_BUG_ON(idx == 0); /* 0 is the unordered timeline, rsvd for cache */
+
it = READ_ONCE(ref->cache);
- if (it && it->timeline == idx)
- return it;
+ if (it) {
+ u64 cached = READ_ONCE(it->timeline);
+
+ if (cached == idx)
+ return it;
+
+#ifdef CONFIG_64BIT /* for cmpxchg(u64) */
+ if (!cached && !cmpxchg(&it->timeline, 0, idx)) {
+ GEM_BUG_ON(i915_active_fence_isset(&it->base));
+ return it;
cpmxchg suggests this needs to be atomic, however above the check for
equality comes from a separate read.
That's fine, and quite common to avoid cmpxchg if the current value
already does not match the expected condition.
How? What is another thread is about to install its idx into
it->timeline with cmpxchg and this thread does not see it because it
just returned on the "cached == idx" condition.
Since there is a lookup code path under the spinlock, perhaps the
unlocked lookup could just fail, and then locked lookup could re-assign
the timeline without the need for cmpxchg?
The unlocked/locked lookup are the same routine. You pointed that out
:-p
Like I remember from ten days ago.. Anyway, I am pointing out it still
doesn't smell right.
__active_lookup(...) -> lockless
{
...
it = fetch_node(ref->tree.rb_node);
while (it) {
if (it->timeline < idx) {
it = fetch_node(it->node.rb_right);
} else if (it->timeline > idx) {
it = fetch_node(it->node.rb_left);
} else {
WRITE_ONCE(ref->cache, it);
break;
}
}
...
}
Then in active_instance, locked:
...
parent = NULL;
p = &ref->tree.rb_node;
while (*p) {
parent = *p;
node = rb_entry(parent, struct active_node, node);
if (node->timeline == idx) {
kmem_cache_free(global.slab_cache, prealloc);
goto out;
}
if (node->timeline < idx)
p = &parent->rb_right;
else
p = &parent->rb_left;
WRITE_ONCE(ref->cache, it);
break;
}
}
...
Tree walk could be consolidated between the two.
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx