Re: [PATCH 2/2] dma-buf: try to replace a signaled fence in reservation_object_add_shared_inplace

Christian König <christian.koenig@xxxxxxx> · Wed, 15 Nov 2017 18:34:07 +0100

Am 15.11.2017 um 17:55 schrieb Chris Wilson:
Quoting Chris Wilson (2017-11-14 14:34:05)
Quoting Christian König (2017-11-14 14:24:44)
Am 06.11.2017 um 17:22 schrieb Chris Wilson:
Quoting Christian König (2017-10-30 14:59:04)
@@ -126,17 +127,28 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
                          dma_fence_put(old_fence);
                          return;
                  }
+
+               if (!signaled && dma_fence_is_signaled(old_fence)) {
+                       signaled = old_fence;
+                       signaled_idx = i;
+               }
How much do we care about only keeping one fence per-ctx here? You could
rearrange this to break on old_fence->context == fence->context ||
dma_fence_is_signaled(old_fence) then everyone can use the final block.
Yeah, that is what David Zhou suggested as well.

I've rejected this approach for now cause I think we still have cases
where we rely on one fence per ctx (but I'm not 100% sure).

I changed patch #1 in this series as you suggest and going to send that
out once more in a minute.

Can we get this upstream as is for now? I won't have much more time
working on this.
Sure, we are only discussing how we might make it look tidier, pure
micro-optimisation with the caveat of losing the one-fence-per-ctx
guarantee.
Ah, one thing to note is that extra checking pushed one of our corner
case tests over its time limit.

If we can completely forgo the one-fence-per-ctx here, what works really
well in testing is

diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
index 5319ac478918..5755e95fab1b 100644
--- a/drivers/dma-buf/reservation.c
+++ b/drivers/dma-buf/reservation.c
@@ -104,39 +104,19 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
                                       struct reservation_object_list *fobj,
                                       struct dma_fence *fence)
  {
-       struct dma_fence *replace = NULL;
-       u32 ctx = fence->context;
-       u32 i;
-
         dma_fence_get(fence);
  
         preempt_disable();
         write_seqcount_begin(&obj->seq);
  
-       for (i = 0; i < fobj->shared_count; ++i) {
-               struct dma_fence *check;
-
-               check = rcu_dereference_protected(fobj->shared[i],
-                                                 reservation_object_held(obj));
-
-               if (check->context == ctx || dma_fence_is_signaled(check)) {
-                       replace = old_fence;
-                       break;
-               }
-       }
-
         /*
          * memory barrier is added by write_seqcount_begin,
          * fobj->shared_count is protected by this lock too
          */
-       RCU_INIT_POINTER(fobj->shared[i], fence);
-       if (!replace)
-               fobj->shared_count++;
+       RCU_INIT_POINTER(fobj->shared[fobj->shared_count++], fence);
  
         write_seqcount_end(&obj->seq);
         preempt_enable();
-
-       dma_fence_put(replace);
  }
  
  static void

  i.e. don't check when not replacing the shared[], on creating the new
  buffer we then discard all the old fences.

It should work for amdgpu as well since you do a ht to coalesce
redundant fences before queuing.

That won't work for all cases. This way the reservation object would 
keep growing without a chance to ever shrink.

Christian.

-Chris


_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel