Quoting Chris Wilson (2017-11-14 14:34:05) > Quoting Christian König (2017-11-14 14:24:44) > > Am 06.11.2017 um 17:22 schrieb Chris Wilson: > > > Quoting Christian König (2017-10-30 14:59:04) > > >> @@ -126,17 +127,28 @@ reservation_object_add_shared_inplace(struct reservation_object *obj, > > >> dma_fence_put(old_fence); > > >> return; > > >> } > > >> + > > >> + if (!signaled && dma_fence_is_signaled(old_fence)) { > > >> + signaled = old_fence; > > >> + signaled_idx = i; > > >> + } > > > How much do we care about only keeping one fence per-ctx here? You could > > > rearrange this to break on old_fence->context == fence->context || > > > dma_fence_is_signaled(old_fence) then everyone can use the final block. > > > > Yeah, that is what David Zhou suggested as well. > > > > I've rejected this approach for now cause I think we still have cases > > where we rely on one fence per ctx (but I'm not 100% sure). > > > > I changed patch #1 in this series as you suggest and going to send that > > out once more in a minute. > > > > Can we get this upstream as is for now? I won't have much more time > > working on this. > > Sure, we are only discussing how we might make it look tidier, pure > micro-optimisation with the caveat of losing the one-fence-per-ctx > guarantee. Ah, one thing to note is that extra checking pushed one of our corner case tests over its time limit. If we can completely forgo the one-fence-per-ctx here, what works really well in testing is diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c index 5319ac478918..5755e95fab1b 100644 --- a/drivers/dma-buf/reservation.c +++ b/drivers/dma-buf/reservation.c @@ -104,39 +104,19 @@ reservation_object_add_shared_inplace(struct reservation_object *obj, struct reservation_object_list *fobj, struct dma_fence *fence) { - struct dma_fence *replace = NULL; - u32 ctx = fence->context; - u32 i; - dma_fence_get(fence); preempt_disable(); write_seqcount_begin(&obj->seq); - for (i = 0; i < fobj->shared_count; ++i) { - struct dma_fence *check; - - check = rcu_dereference_protected(fobj->shared[i], - reservation_object_held(obj)); - - if (check->context == ctx || dma_fence_is_signaled(check)) { - replace = old_fence; - break; - } - } - /* * memory barrier is added by write_seqcount_begin, * fobj->shared_count is protected by this lock too */ - RCU_INIT_POINTER(fobj->shared[i], fence); - if (!replace) - fobj->shared_count++; + RCU_INIT_POINTER(fobj->shared[fobj->shared_count++], fence); write_seqcount_end(&obj->seq); preempt_enable(); - - dma_fence_put(replace); } static void i.e. don't check when not replacing the shared[], on creating the new buffer we then discard all the old fences. It should work for amdgpu as well since you do a ht to coalesce redundant fences before queuing. -Chris