Re: [PATCH 08/10] dma-buf/resv: replace shared fence with new fences container

Christian König <ckoenig.leichtzumerken@xxxxxxxxx> · Thu, 22 Aug 2019 10:37:20 +0200




Am 21.08.19 um 19:35 schrieb Chris Wilson:
Quoting Chris Wilson (2019-08-21 16:24:22)
Quoting Christian König (2019-08-21 13:31:45)
@@ -117,17 +120,10 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
                 busy_check_writer(rcu_dereference(obj->base.resv->fence_excl));
  
         /* Translate shared fences to READ set of engines */
-       list = rcu_dereference(obj->base.resv->fence);
-       if (list) {
-               unsigned int shared_count = list->shared_count, i;
-
-               for (i = 0; i < shared_count; ++i) {
-                       struct dma_fence *fence =
-                               rcu_dereference(list->shared[i]);
-
-                       args->busy |= busy_check_reader(fence);
-               }
-       }
+       readers = dma_resv_fences_get_rcu(&obj->base.resv->readers);
+       dma_fence_array_for_each(fence, cursor, readers)
+               args->busy |= busy_check_reader(fence);
+       dma_fence_put(readers);
That's underwhelming, the full-mb shows up in scaling tests (I'll test
the impact of this series later). Something like,
To put some numbers to it, adding the full-mb adds 5ns to a single
thread on Kabylake and 20ns under contention.

The question is if that's the use case we want to optimize for.

Querying a buffer for business is something we do absolutely rarely on 
amdgpu, e.g. IIRC we even grab the full reservation lock for this.

But adding new fences comes with every command submission.

What could maybe work is the "do { } while (fence has changed); loop you 
suggested earlier in this mail thread, but I need to double check if 
that would really work or clash with recycling dma_fence_arrays().

Christian.

-Chris

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel