[PATCH 2/2] dma-buf: try to replace a signaled fence in reservation_object_add_shared_inplace

christian.koenig@xxxxxxx (Christian König) · Wed, 15 Nov 2017 18:34:07 +0100

Am 15.11.2017 um 17:55 schrieb Chris Wilson:
> Quoting Chris Wilson (2017-11-14 14:34:05)
>> Quoting Christian KÃ¶nig (2017-11-14 14:24:44)
>>> Am 06.11.2017 um 17:22 schrieb Chris Wilson:
>>>> Quoting Christian KÃ¶nig (2017-10-30 14:59:04)
>>>>> @@ -126,17 +127,28 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
>>>>>                           dma_fence_put(old_fence);
>>>>>                           return;
>>>>>                   }
>>>>> +
>>>>> +               if (!signaled && dma_fence_is_signaled(old_fence)) {
>>>>> +                       signaled = old_fence;
>>>>> +                       signaled_idx = i;
>>>>> +               }
>>>> How much do we care about only keeping one fence per-ctx here? You could
>>>> rearrange this to break on old_fence->context == fence->context ||
>>>> dma_fence_is_signaled(old_fence) then everyone can use the final block.
>>> Yeah, that is what David Zhou suggested as well.
>>>
>>> I've rejected this approach for now cause I think we still have cases
>>> where we rely on one fence per ctx (but I'm not 100% sure).
>>>
>>> I changed patch #1 in this series as you suggest and going to send that
>>> out once more in a minute.
>>>
>>> Can we get this upstream as is for now? I won't have much more time
>>> working on this.
>> Sure, we are only discussing how we might make it look tidier, pure
>> micro-optimisation with the caveat of losing the one-fence-per-ctx
>> guarantee.
> Ah, one thing to note is that extra checking pushed one of our corner
> case tests over its time limit.
>
> If we can completely forgo the one-fence-per-ctx here, what works really
> well in testing is
>
> diff --git a/drivers/dma-buf/reservation.c b/drivers/dma-buf/reservation.c
> index 5319ac478918..5755e95fab1b 100644
> --- a/drivers/dma-buf/reservation.c
> +++ b/drivers/dma-buf/reservation.c
> @@ -104,39 +104,19 @@ reservation_object_add_shared_inplace(struct reservation_object *obj,
>                                        struct reservation_object_list *fobj,
>                                        struct dma_fence *fence)
>   {
> -       struct dma_fence *replace = NULL;
> -       u32 ctx = fence->context;
> -       u32 i;
> -
>          dma_fence_get(fence);
>   
>          preempt_disable();
>          write_seqcount_begin(&obj->seq);
>   
> -       for (i = 0; i < fobj->shared_count; ++i) {
> -               struct dma_fence *check;
> -
> -               check = rcu_dereference_protected(fobj->shared[i],
> -                                                 reservation_object_held(obj));
> -
> -               if (check->context == ctx || dma_fence_is_signaled(check)) {
> -                       replace = old_fence;
> -                       break;
> -               }
> -       }
> -
>          /*
>           * memory barrier is added by write_seqcount_begin,
>           * fobj->shared_count is protected by this lock too
>           */
> -       RCU_INIT_POINTER(fobj->shared[i], fence);
> -       if (!replace)
> -               fobj->shared_count++;
> +       RCU_INIT_POINTER(fobj->shared[fobj->shared_count++], fence);
>   
>          write_seqcount_end(&obj->seq);
>          preempt_enable();
> -
> -       dma_fence_put(replace);
>   }
>   
>   static void
>
>   i.e. don't check when not replacing the shared[], on creating the new
>   buffer we then discard all the old fences.
>
> It should work for amdgpu as well since you do a ht to coalesce
> redundant fences before queuing.

That won't work for all cases. This way the reservation object would 
keep growing without a chance to ever shrink.

Christian.

> -Chris