Am 15.11.2017 um 17:55 schrieb Chris Wilson: > Quoting Chris Wilson (2017-11-14 14:34:05) >> Quoting Christian König (2017-11-14 14:24:44) >>> Am 06.11.2017 um 17:22 schrieb Chris Wilson: >>>> Quoting Christian König (2017-10-30 14:59:04) >>>>> @@ -126,17 +127,28 @@ reservation_object_add_shared_inplace(struct reservation_object *obj, >>>>> dma_fence_put(old_fence); >>>>> return; >>>>> } >>>>> + >>>>> + if (!signaled && dma_fence_is_signaled(old_fence)) { >>>>> + signaled = old_fence; >>>>> + signaled_idx = i; >>>>> + } >>>> How much do we care about only keeping one fence per-ctx here? You could >>>> rearrange this to break on old_fence->context == fence->context || >>>> dma_fence_is_signaled(old_fence) then everyone can use the final block. >>> Yeah, that is what David Zhou suggested as well. >>> >>> I've rejected this approach for now cause I think we still have cases >>> where we rely on one fence per ctx (but I'm not 100% sure). >>> >>> I changed patch #1 in this series as you suggest and going to send that >>> out once more in a minute. >>> >>> Can we get this upstream as is for now? I won't have much more time >>> working on this. >> Sure, we are only discussing how we might make it look tidier, pure >> micro-optimisation with the caveat of losing the one-fence-per-ctx >> guarantee. > Ah, one thing to note is that extra checking pushed one of our corner > case tests over its time limit. > > If we can completely forgo the one-fence-per-ctx here, what works really > well in testing is > > diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c > index 5319ac478918..5755e95fab1b 100644 > --- a/drivers/dma-buf/reservation.c > +++ b/drivers/dma-buf/reservation.c > @@ -104,39 +104,19 @@ reservation_object_add_shared_inplace(struct reservation_object *obj, > struct reservation_object_list *fobj, > struct dma_fence *fence) > { > - struct dma_fence *replace = NULL; > - u32 ctx = fence->context; > - u32 i; > - > dma_fence_get(fence); > > preempt_disable(); > write_seqcount_begin(&obj->seq); > > - for (i = 0; i < fobj->shared_count; ++i) { > - struct dma_fence *check; > - > - check = rcu_dereference_protected(fobj->shared[i], > - reservation_object_held(obj)); > - > - if (check->context == ctx || dma_fence_is_signaled(check)) { > - replace = old_fence; > - break; > - } > - } > - > /* > * memory barrier is added by write_seqcount_begin, > * fobj->shared_count is protected by this lock too > */ > - RCU_INIT_POINTER(fobj->shared[i], fence); > - if (!replace) > - fobj->shared_count++; > + RCU_INIT_POINTER(fobj->shared[fobj->shared_count++], fence); > > write_seqcount_end(&obj->seq); > preempt_enable(); > - > - dma_fence_put(replace); > } > > static void > > i.e. don't check when not replacing the shared[], on creating the new > buffer we then discard all the old fences. > > It should work for amdgpu as well since you do a ht to coalesce > redundant fences before queuing. That won't work for all cases. This way the reservation object would keep growing without a chance to ever shrink. Christian. > -Chris