On 01/22/2014 01:38 PM, Maarten Lankhorst wrote: > op 22-01-14 13:11, Thomas Hellstrom schreef: >> On 01/22/2014 11:58 AM, Maarten Lankhorst wrote: >>> op 22-01-14 11:27, Thomas Hellstrom schreef: >>>> On 01/22/2014 10:55 AM, Maarten Lankhorst wrote: >>>>> op 22-01-14 10:40, Thomas Hellstrom schreef: >>>>>> On 01/22/2014 09:19 AM, Maarten Lankhorst wrote: >>>>>>> op 21-01-14 18:44, Thomas Hellstrom schreef: >>>>>>>> On 01/21/2014 04:29 PM, Maarten Lankhorst wrote: >>>>>>>>> Hey, >>>>>>>>> >>>>>>>>> op 21-01-14 16:17, Thomas Hellstrom schreef: >>>>>>>>>> Maarten, for this and the other patches in this series, >>>>>>>>>> >>>>>>>>>> I seem to recall we have this discussion before? >>>>>>>>>> IIRC I stated that reservation was a too heavy-weight lock to >>>>>>>>>> hold to >>>>>>>>>> determine whether a buffer was idle? It's a pretty nasty >>>>>>>>>> thing to >>>>>>>>>> build in. >>>>>>>>>> >>>>>>>>> I've sent this patch after determining that this already didn't >>>>>>>>> end up >>>>>>>>> being heavyweight. >>>>>>>>> Most places were already using the fence_lock and reservation, I >>>>>>>>> just >>>>>>>>> fixed up the few >>>>>>>>> places that didn't hold a reservation while waiting. >>>>>>>>> Converting the >>>>>>>>> few places that didn't >>>>>>>>> ended up being trivial, so I thought I'd submit it. >>>>>>>> Actually the only *valid* reason for holding a reservation when >>>>>>>> waiting >>>>>>>> for idle is >>>>>>>> 1) You want to block further command submission on the buffer. >>>>>>>> 2) You want to switch GPU engine and don't have access to gpu >>>>>>>> semaphores >>>>>>>> / barriers. >>>>>>>> >>>>>>>> Reservation has the nasty side effect that it blocks command >>>>>>>> submission >>>>>>>> and pins the buffer (in addition now makes the evict list >>>>>>>> traversals >>>>>>>> skip the buffer) which in general is *not* necessary for most wait >>>>>>>> cases, so we should instead actually convert the wait cases that >>>>>>>> don't >>>>>>>> fulfill 1) and 2) above in the other direction if we have >>>>>>>> performance >>>>>>>> and latency-reduction in mind. I can't see how a spinlock >>>>>>>> protecting a >>>>>>>> fence pointer or fence list is stopping you from using RW >>>>>>>> fences as >>>>>>>> long >>>>>>>> as the spinlock is held while manipulating the fence list? >>>>>>>> >>>>>>> You wish. Fine I'll enumerate all cases of ttm_bo_wait (with the >>>>>>> patchset, though) and enumerate if they can be changed to work >>>>>>> without >>>>>>> reservation or not. >>>>>>> >>>>>>> ttm/ttm_bo.c >>>>>>> ttm_bo_cleanup_refs_or_queue: needs reservation and ttm_bo_wait to >>>>>>> finish for the direct destroy fastpath, if either fails it needs >>>>>>> to be >>>>>>> queued. Cannot work without reservation. >>>>>> Doesn't block and no significant reservation contention expected. >>>>>> >>>>>>> ttm_bo_cleanup_refs_and_unlock: already drops reservation to wait, >>>>>>> doesn't need to re-acquire. Simply reordering ttm_bo_wait until >>>>>>> after >>>>>>> re-reserve is enough. >>>>>> Currently follows the above rules. >>>>>> >>>>>>> ttm_bo_evict: already has the reservation, cannot be dropped since >>>>>>> only trylock is allowed. Dropping reservation would cause badness, >>>>>>> cannot be converted. >>>>>> Follows rule 2 above. We're about to move the buffer and if that >>>>>> can't >>>>>> be pipelined using the GPU (which TTM currently doesn't allow), we >>>>>> need >>>>>> to wait. Although eviction should be low priority compared to new >>>>>> command submission, so I can't really see why we couldn't wait >>>>>> before >>>>>> trying to reserve here? >>>>>> >>>>>>> ttm_bo_move_buffer: called from ttm_bo_validate, cannot drop >>>>>>> reservation for same reason as ttm_bo_evict. It might be part of a >>>>>>> ticketed reservation so really don't drop lock here. >>>>>> Part of command submission and as such follows rule 2 above. If >>>>>> we can >>>>>> pipeline the move with the GPU, no need to wait (but needs to be >>>>>> implemented, of course). >>>>>> >>>>>>> ttm_bo_synccpu_write_grab: the wait could be converted to be done >>>>>>> afterwards, without fence_lock. But in this case reservation could >>>>>>> take the role of fence_lock too, >>>>>>> >>>>>>> so no separate fence_lock would be needed. >>>>>> With the exception that reservation is more likely to be contended. >>>>> True but rule 1. >>>>>>> ttm_bo_swapout: see ttm_bo_evict. >>>>>>> >>>>>>> ttm/ttm_bo_util.c: >>>>>>> ttm_bo_move_accel_cleanup: calls ttm_bo_wait, cannot drop lock, see >>>>>>> ttm_bo_move_buffer, can be called from that function. >>>>>> Rule 2. >>>>>> >>>>>>> ttm/ttm_bo_vm.c >>>>>>> ttm_bo_vm_fault_idle: I guess you COULD drop the reservation here, >>>>>>> but >>>>>>> you already had the reservation, so a similar optimization to >>>>>>> ttm_bo_synccpu_write_grab could be done without requiring >>>>>>> fence_lock. >>>>>>> If you would write it like that, you would end up with a patch >>>>>>> similar >>>>>>> to drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep. I >>>>>>> think >>>>>>> we should do this, an >>>>>>> >>>>>>> Ok, so the core does NOT need fence_lock because we can never drop >>>>>>> reservations except in synccpu_write_grab and maybe >>>>>>> ttm_bo_vm_fault_idle, but even in those cases reservation is >>>>>>> done. So >>>>>>> that could be used instead of fence_lock. >>>>>>> >>>>>>> nouveau_gem_ioctl_cpu_prep: >>>>>>> Either block on a global spinlock or a local reservation lock. >>>>>>> Doesn't >>>>>>> matter much which, I don't need the need to keep a global lock for >>>>>>> this function... >>>>>>> 2 cases can happen in the trylock reservation failure case: >>>>>>> buffer is >>>>>>> not reserved, so it's not in the process of being evicted. >>>>>>> buffer is >>>>>>> reserved, which means it's being used in command submission right >>>>>>> now, >>>>>>> or in one of the functions described above (eg not idle). >>>>>>> >>>>>>> nouveau_gem_pushbuf_reloc_apply: >>>>>>> has to call ttm_bo_wait with reservation, cannot be dropped. >>>>>>> >>>>>>> So for core ttm and nouveau the fence_lock is never needed, radeon >>>>>>> has >>>>>>> only 1 function that calls ttm_bo_wait which uses a reservation >>>>>>> too. >>>>>>> It doesn't need the fence_lock either. >>>>>> And vmwgfx now also has a syccpu IOCTL (see drm-next). >>>>>> >>>>>> So assuming that we converted the functions that can be converted to >>>>>> wait outside of reservation, the same way you have done with >>>>>> Nouveau, >>>>>> leaving the ones that fall under 1) and 2) above, I would still >>>>>> argue >>>>>> that a spinlock should be used because taking a reservation may >>>>>> implicitly mean wait for gpu, and could give bad performance- and >>>>>> latency charateristics. You shouldn't need to wait for gpu to check >>>>>> for >>>>>> buffer idle. >>>>> Except that without reservation you can't tell if the buffer is >>>>> really >>>>> idle, or is currently being >>>>> used as part of some command submission/eviction before the fence >>>>> pointer is set. >>>>> >>>> Yes, but when that matters, you're either in case 1 or case 2 again. >>>> Otherwise, when you release the reservation, you still don't know. >>>> A typical example of this is the vmwgfx synccpu ioctl, where you can >>>> either choose to block command submission (not used currently) >>>> or not (user-space inter-process synchronization). The former is a >>>> case >>>> 1 wait and holds reservation while waiting for idle and then ups >>>> cpu_writers. The latter waits without reservation for previously >>>> submitted rendering to finish. >>> Yeah you could, but what exactly are you waiting on then? If it's some >>> specific existing rendering, >>> I would argue that you should create an android userspace fence during >>> command submission, >>> or provide your own api to block on a specfic fence in userspace. >>> >>> If you don't then I think taking a reservation is not unreasonable. In >>> the most common case the buffer is >>> idle and not reserved, so it isn't contested. The actual waiting >>> itself can be done without reservation held, >>> by taking a reference on the fence. >> Yeah, here is where we disagree. I'm afraid people will start getting >> sloppy with reservations and use them to protect more stuff, and after a >> while they start wondering why the GPU command queue drains... >> >> Perhaps we could agree on a solution (building on one of your original >> ideas) where we require reservation to modify the fence pointers, and >> the buffer object moving flag, but the structure holding the fence >> pointer(s) is RCU safe, so that the pointers can be safely read under an >> rcu lock. > I think not modifying the fence pointer without reservation would be > safest. > I also don't think readers need the capability to clear sync_obj, this > might > simplify the implementation some. > > But my preferred option is getting rid of sync_obj completely, and > move to > using reservation_object->fence_shared/exclusive, like the incomplete > proof > of concept conversion done in nouveau. But then I do need to grab the > reservation lock to touch things, because fences may be set by the > i915 driver > I share the reservation_object with. > > Alternatively could vmwgfx hold a spinlock when decrementing fence > refcount instead? > Then we wouldn't need this in the core, and vmwgfx could use: Maarten, requiring reservation to access the fence pointers really turns my gut! Being able to read them under rcu is a remedy, but something I figure would be the default and recommended thing to do. Not a vmware exception. This is about as far as I'm prepared to go. /Thomas _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel