Am 22.08.19 um 10:37 schrieb Christian König:
Am 21.08.19 um 19:35 schrieb Chris Wilson:
Quoting Chris Wilson (2019-08-21 16:24:22)
Quoting Christian König (2019-08-21 13:31:45)
@@ -117,17 +120,10 @@ i915_gem_busy_ioctl(struct drm_device *dev,
void *data,
busy_check_writer(rcu_dereference(obj->base.resv->fence_excl));
/* Translate shared fences to READ set of engines */
- list = rcu_dereference(obj->base.resv->fence);
- if (list) {
- unsigned int shared_count = list->shared_count, i;
-
- for (i = 0; i < shared_count; ++i) {
- struct dma_fence *fence =
- rcu_dereference(list->shared[i]);
-
- args->busy |= busy_check_reader(fence);
- }
- }
+ readers = dma_resv_fences_get_rcu(&obj->base.resv->readers);
+ dma_fence_array_for_each(fence, cursor, readers)
+ args->busy |= busy_check_reader(fence);
+ dma_fence_put(readers);
That's underwhelming, the full-mb shows up in scaling tests (I'll test
the impact of this series later). Something like,
To put some numbers to it, adding the full-mb adds 5ns to a single
thread on Kabylake and 20ns under contention.
The question is if that's the use case we want to optimize for.
Querying a buffer for business is something we do absolutely rarely on
amdgpu, e.g. IIRC we even grab the full reservation lock for this.
But adding new fences comes with every command submission.
What could maybe work is the "do { } while (fence has changed); loop
you suggested earlier in this mail thread, but I need to double check
if that would really work or clash with recycling dma_fence_arrays().
Ok thinking about it some more that won't work either.
See the dma_fence_array is only guaranteed to not change when you hold a
reference to it. Otherwise we don't guarantee anything.
Christian.
Christian.
-Chris