On 01/22/2014 10:55 AM, Maarten Lankhorst wrote: > op 22-01-14 10:40, Thomas Hellstrom schreef: >> On 01/22/2014 09:19 AM, Maarten Lankhorst wrote: >>> op 21-01-14 18:44, Thomas Hellstrom schreef: >>>> On 01/21/2014 04:29 PM, Maarten Lankhorst wrote: >>>>> Hey, >>>>> >>>>> op 21-01-14 16:17, Thomas Hellstrom schreef: >>>>>> Maarten, for this and the other patches in this series, >>>>>> >>>>>> I seem to recall we have this discussion before? >>>>>> IIRC I stated that reservation was a too heavy-weight lock to >>>>>> hold to >>>>>> determine whether a buffer was idle? It's a pretty nasty thing to >>>>>> build in. >>>>>> >>>>> I've sent this patch after determining that this already didn't >>>>> end up >>>>> being heavyweight. >>>>> Most places were already using the fence_lock and reservation, I just >>>>> fixed up the few >>>>> places that didn't hold a reservation while waiting. Converting the >>>>> few places that didn't >>>>> ended up being trivial, so I thought I'd submit it. >>>> Actually the only *valid* reason for holding a reservation when >>>> waiting >>>> for idle is >>>> 1) You want to block further command submission on the buffer. >>>> 2) You want to switch GPU engine and don't have access to gpu >>>> semaphores >>>> / barriers. >>>> >>>> Reservation has the nasty side effect that it blocks command >>>> submission >>>> and pins the buffer (in addition now makes the evict list traversals >>>> skip the buffer) which in general is *not* necessary for most wait >>>> cases, so we should instead actually convert the wait cases that don't >>>> fulfill 1) and 2) above in the other direction if we have performance >>>> and latency-reduction in mind. I can't see how a spinlock protecting a >>>> fence pointer or fence list is stopping you from using RW fences as >>>> long >>>> as the spinlock is held while manipulating the fence list? >>>> >>> You wish. Fine I'll enumerate all cases of ttm_bo_wait (with the >>> patchset, though) and enumerate if they can be changed to work without >>> reservation or not. >>> >>> ttm/ttm_bo.c >>> ttm_bo_cleanup_refs_or_queue: needs reservation and ttm_bo_wait to >>> finish for the direct destroy fastpath, if either fails it needs to be >>> queued. Cannot work without reservation. >> Doesn't block and no significant reservation contention expected. >> >>> ttm_bo_cleanup_refs_and_unlock: already drops reservation to wait, >>> doesn't need to re-acquire. Simply reordering ttm_bo_wait until after >>> re-reserve is enough. >> Currently follows the above rules. >> >>> ttm_bo_evict: already has the reservation, cannot be dropped since >>> only trylock is allowed. Dropping reservation would cause badness, >>> cannot be converted. >> Follows rule 2 above. We're about to move the buffer and if that can't >> be pipelined using the GPU (which TTM currently doesn't allow), we need >> to wait. Although eviction should be low priority compared to new >> command submission, so I can't really see why we couldn't wait before >> trying to reserve here? >> >>> ttm_bo_move_buffer: called from ttm_bo_validate, cannot drop >>> reservation for same reason as ttm_bo_evict. It might be part of a >>> ticketed reservation so really don't drop lock here. >> Part of command submission and as such follows rule 2 above. If we can >> pipeline the move with the GPU, no need to wait (but needs to be >> implemented, of course). >> >>> ttm_bo_synccpu_write_grab: the wait could be converted to be done >>> afterwards, without fence_lock. But in this case reservation could >>> take the role of fence_lock too, >>> >>> so no separate fence_lock would be needed. >> With the exception that reservation is more likely to be contended. > True but rule 1. >>> ttm_bo_swapout: see ttm_bo_evict. >>> >>> ttm/ttm_bo_util.c: >>> ttm_bo_move_accel_cleanup: calls ttm_bo_wait, cannot drop lock, see >>> ttm_bo_move_buffer, can be called from that function. >> Rule 2. >> >>> ttm/ttm_bo_vm.c >>> ttm_bo_vm_fault_idle: I guess you COULD drop the reservation here, but >>> you already had the reservation, so a similar optimization to >>> ttm_bo_synccpu_write_grab could be done without requiring fence_lock. >>> If you would write it like that, you would end up with a patch similar >>> to drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep. I think >>> we should do this, an >>> >>> Ok, so the core does NOT need fence_lock because we can never drop >>> reservations except in synccpu_write_grab and maybe >>> ttm_bo_vm_fault_idle, but even in those cases reservation is done. So >>> that could be used instead of fence_lock. >>> >>> nouveau_gem_ioctl_cpu_prep: >>> Either block on a global spinlock or a local reservation lock. Doesn't >>> matter much which, I don't need the need to keep a global lock for >>> this function... >>> 2 cases can happen in the trylock reservation failure case: buffer is >>> not reserved, so it's not in the process of being evicted. buffer is >>> reserved, which means it's being used in command submission right now, >>> or in one of the functions described above (eg not idle). >>> >>> nouveau_gem_pushbuf_reloc_apply: >>> has to call ttm_bo_wait with reservation, cannot be dropped. >>> >>> So for core ttm and nouveau the fence_lock is never needed, radeon has >>> only 1 function that calls ttm_bo_wait which uses a reservation too. >>> It doesn't need the fence_lock either. >> And vmwgfx now also has a syccpu IOCTL (see drm-next). >> >> So assuming that we converted the functions that can be converted to >> wait outside of reservation, the same way you have done with Nouveau, >> leaving the ones that fall under 1) and 2) above, I would still argue >> that a spinlock should be used because taking a reservation may >> implicitly mean wait for gpu, and could give bad performance- and >> latency charateristics. You shouldn't need to wait for gpu to check for >> buffer idle. > Except that without reservation you can't tell if the buffer is really > idle, or is currently being > used as part of some command submission/eviction before the fence > pointer is set. > Yes, but when that matters, you're either in case 1 or case 2 again. Otherwise, when you release the reservation, you still don't know. A typical example of this is the vmwgfx synccpu ioctl, where you can either choose to block command submission (not used currently) or not (user-space inter-process synchronization). The former is a case 1 wait and holds reservation while waiting for idle and then ups cpu_writers. The latter waits without reservation for previously submitted rendering to finish. /Thomas > ~Maarten > _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel