Re: [PATCH 2/5] drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep

Thomas Hellstrom <thomas@xxxxxxxxxxxx> · Wed, 22 Jan 2014 11:27:14 +0100

On 01/22/2014 10:55 AM, Maarten Lankhorst wrote:
> op 22-01-14 10:40, Thomas Hellstrom schreef:
>> On 01/22/2014 09:19 AM, Maarten Lankhorst wrote:
>>> op 21-01-14 18:44, Thomas Hellstrom schreef:
>>>> On 01/21/2014 04:29 PM, Maarten Lankhorst wrote:
>>>>> Hey,
>>>>>
>>>>> op 21-01-14 16:17, Thomas Hellstrom schreef:
>>>>>> Maarten, for this and the other patches in this series,
>>>>>>
>>>>>> I seem to recall we have this discussion before?
>>>>>> IIRC I stated that reservation was a too heavy-weight lock to
>>>>>> hold to
>>>>>> determine whether a buffer was idle? It's a pretty nasty thing to
>>>>>> build in.
>>>>>>
>>>>> I've sent this patch after determining that this already didn't
>>>>> end up
>>>>> being heavyweight.
>>>>> Most places were already using the fence_lock and reservation, I just
>>>>> fixed up the few
>>>>> places that didn't hold a reservation while waiting. Converting the
>>>>> few places that didn't
>>>>> ended up being trivial, so I thought I'd submit it.
>>>> Actually the only *valid* reason for holding a reservation when
>>>> waiting
>>>> for idle is
>>>> 1) You want to block further command submission on the buffer.
>>>> 2) You want to switch GPU engine and don't have access to gpu
>>>> semaphores
>>>> / barriers.
>>>>
>>>> Reservation has the nasty side effect that it blocks command
>>>> submission
>>>> and pins the buffer (in addition now makes the evict list traversals
>>>> skip the buffer) which in general is *not* necessary for most wait
>>>> cases, so we should instead actually convert the wait cases that don't
>>>> fulfill 1) and 2) above in the other direction if we have performance
>>>> and latency-reduction in mind. I can't see how a spinlock protecting a
>>>> fence pointer or fence list is stopping you from using RW fences as
>>>> long
>>>> as the spinlock is held while manipulating the fence list?
>>>>
>>> You wish. Fine I'll enumerate all cases of ttm_bo_wait (with the
>>> patchset, though) and enumerate if they can be changed to work without
>>> reservation or not.
>>>
>>> ttm/ttm_bo.c
>>> ttm_bo_cleanup_refs_or_queue: needs reservation and ttm_bo_wait to
>>> finish for the direct destroy fastpath, if either fails it needs to be
>>> queued. Cannot work without reservation.
>> Doesn't block and no significant reservation contention expected.
>>
>>> ttm_bo_cleanup_refs_and_unlock: already drops reservation to wait,
>>> doesn't need to re-acquire. Simply reordering ttm_bo_wait until after
>>> re-reserve is enough.
>> Currently follows the above rules.
>>
>>> ttm_bo_evict: already has the reservation, cannot be dropped since
>>> only trylock is allowed. Dropping reservation would cause badness,
>>> cannot be converted.
>> Follows rule 2 above. We're about to move the buffer and if that can't
>> be pipelined using the GPU (which TTM currently doesn't allow), we need
>> to wait. Although eviction should be low priority compared to new
>> command submission, so I can't really see why we couldn't wait before
>> trying to reserve here?
>>
>>> ttm_bo_move_buffer: called from ttm_bo_validate, cannot drop
>>> reservation for same reason as ttm_bo_evict. It might be part of a
>>> ticketed reservation so really don't drop lock here.
>> Part of command submission and as such follows rule 2 above. If we can
>> pipeline the move with the GPU, no need to wait (but needs to be
>> implemented, of course).
>>
>>> ttm_bo_synccpu_write_grab: the wait could be converted to be done
>>> afterwards, without  fence_lock. But in this case reservation could
>>> take the role of fence_lock too,
>>>
>>> so no separate fence_lock would be needed.
>> With the exception that reservation is more likely to be contended.
> True but rule 1.
>>> ttm_bo_swapout: see ttm_bo_evict.
>>>
>>> ttm/ttm_bo_util.c:
>>> ttm_bo_move_accel_cleanup: calls ttm_bo_wait, cannot drop lock, see
>>> ttm_bo_move_buffer, can be called from that function.
>> Rule 2.
>>
>>> ttm/ttm_bo_vm.c
>>> ttm_bo_vm_fault_idle: I guess you COULD drop the reservation here, but
>>> you already had the reservation, so a similar optimization to
>>> ttm_bo_synccpu_write_grab could be done without requiring fence_lock.
>>> If you would write it like that, you would end up with a patch similar
>>> to drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep. I think
>>> we should do this, an
>>>
>>> Ok, so the core does NOT need fence_lock because we can never drop
>>> reservations except in synccpu_write_grab and maybe
>>> ttm_bo_vm_fault_idle, but even in those cases reservation is done. So
>>> that could be used instead of fence_lock.
>>>
>>> nouveau_gem_ioctl_cpu_prep:
>>> Either block on a global spinlock or a local reservation lock. Doesn't
>>> matter much which, I don't need the need to keep a global lock for
>>> this function...
>>> 2 cases can happen in the trylock reservation failure case: buffer is
>>> not reserved, so it's not in the process of being evicted. buffer is
>>> reserved, which means it's being used in command submission right now,
>>> or in one of the functions described above (eg not idle).
>>>
>>> nouveau_gem_pushbuf_reloc_apply:
>>> has to call ttm_bo_wait with reservation, cannot be dropped.
>>>
>>> So for core ttm and nouveau the fence_lock is never needed, radeon has
>>> only 1 function that calls ttm_bo_wait which uses a reservation too.
>>> It doesn't need the fence_lock either.
>> And vmwgfx now also has a syccpu IOCTL (see drm-next).
>>
>> So assuming that we converted the functions that can be converted to
>> wait outside of reservation, the same way you have done with Nouveau,
>> leaving the ones that fall under 1) and 2) above, I would still argue
>> that a spinlock should be used because taking a reservation may
>> implicitly mean wait for gpu, and could give bad performance- and
>> latency charateristics. You shouldn't need to wait for gpu to check for
>> buffer idle.
> Except that without reservation you can't tell if the buffer is really
> idle, or is currently being
> used as part of some command submission/eviction before the fence
> pointer is set.
>

Yes, but when that matters, you're either in case 1 or case 2 again.
Otherwise, when you release the reservation, you still don't know.
A typical example of this is the vmwgfx synccpu ioctl, where you can
either choose to block command submission (not used currently)
or not (user-space inter-process synchronization). The former is a case
1 wait and holds reservation while waiting for idle and then ups
cpu_writers. The latter waits without reservation for previously
submitted rendering to finish.

/Thomas

> ~Maarten
>

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel