Re: [PATCH 2/5] drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



op 22-01-14 13:11, Thomas Hellstrom schreef:
On 01/22/2014 11:58 AM, Maarten Lankhorst wrote:
op 22-01-14 11:27, Thomas Hellstrom schreef:
On 01/22/2014 10:55 AM, Maarten Lankhorst wrote:
op 22-01-14 10:40, Thomas Hellstrom schreef:
On 01/22/2014 09:19 AM, Maarten Lankhorst wrote:
op 21-01-14 18:44, Thomas Hellstrom schreef:
On 01/21/2014 04:29 PM, Maarten Lankhorst wrote:
Hey,

op 21-01-14 16:17, Thomas Hellstrom schreef:
Maarten, for this and the other patches in this series,

I seem to recall we have this discussion before?
IIRC I stated that reservation was a too heavy-weight lock to
hold to
determine whether a buffer was idle? It's a pretty nasty thing to
build in.

I've sent this patch after determining that this already didn't
end up
being heavyweight.
Most places were already using the fence_lock and reservation, I
just
fixed up the few
places that didn't hold a reservation while waiting. Converting the
few places that didn't
ended up being trivial, so I thought I'd submit it.
Actually the only *valid* reason for holding a reservation when
waiting
for idle is
1) You want to block further command submission on the buffer.
2) You want to switch GPU engine and don't have access to gpu
semaphores
/ barriers.

Reservation has the nasty side effect that it blocks command
submission
and pins the buffer (in addition now makes the evict list traversals
skip the buffer) which in general is *not* necessary for most wait
cases, so we should instead actually convert the wait cases that
don't
fulfill 1) and 2) above in the other direction if we have
performance
and latency-reduction in mind. I can't see how a spinlock
protecting a
fence pointer or fence list is stopping you from using RW fences as
long
as the spinlock is held while manipulating the fence list?

You wish. Fine I'll enumerate all cases of ttm_bo_wait (with the
patchset, though) and enumerate if they can be changed to work
without
reservation or not.

ttm/ttm_bo.c
ttm_bo_cleanup_refs_or_queue: needs reservation and ttm_bo_wait to
finish for the direct destroy fastpath, if either fails it needs
to be
queued. Cannot work without reservation.
Doesn't block and no significant reservation contention expected.

ttm_bo_cleanup_refs_and_unlock: already drops reservation to wait,
doesn't need to re-acquire. Simply reordering ttm_bo_wait until after
re-reserve is enough.
Currently follows the above rules.

ttm_bo_evict: already has the reservation, cannot be dropped since
only trylock is allowed. Dropping reservation would cause badness,
cannot be converted.
Follows rule 2 above. We're about to move the buffer and if that can't
be pipelined using the GPU (which TTM currently doesn't allow), we
need
to wait. Although eviction should be low priority compared to new
command submission, so I can't really see why we couldn't wait before
trying to reserve here?

ttm_bo_move_buffer: called from ttm_bo_validate, cannot drop
reservation for same reason as ttm_bo_evict. It might be part of a
ticketed reservation so really don't drop lock here.
Part of command submission and as such follows rule 2 above. If we can
pipeline the move with the GPU, no need to wait (but needs to be
implemented, of course).

ttm_bo_synccpu_write_grab: the wait could be converted to be done
afterwards, without  fence_lock. But in this case reservation could
take the role of fence_lock too,

so no separate fence_lock would be needed.
With the exception that reservation is more likely to be contended.
True but rule 1.
ttm_bo_swapout: see ttm_bo_evict.

ttm/ttm_bo_util.c:
ttm_bo_move_accel_cleanup: calls ttm_bo_wait, cannot drop lock, see
ttm_bo_move_buffer, can be called from that function.
Rule 2.

ttm/ttm_bo_vm.c
ttm_bo_vm_fault_idle: I guess you COULD drop the reservation here,
but
you already had the reservation, so a similar optimization to
ttm_bo_synccpu_write_grab could be done without requiring fence_lock.
If you would write it like that, you would end up with a patch
similar
to drm/nouveau: add reservation to nouveau_gem_ioctl_cpu_prep. I
think
we should do this, an

Ok, so the core does NOT need fence_lock because we can never drop
reservations except in synccpu_write_grab and maybe
ttm_bo_vm_fault_idle, but even in those cases reservation is done. So
that could be used instead of fence_lock.

nouveau_gem_ioctl_cpu_prep:
Either block on a global spinlock or a local reservation lock.
Doesn't
matter much which, I don't need the need to keep a global lock for
this function...
2 cases can happen in the trylock reservation failure case: buffer is
not reserved, so it's not in the process of being evicted. buffer is
reserved, which means it's being used in command submission right
now,
or in one of the functions described above (eg not idle).

nouveau_gem_pushbuf_reloc_apply:
has to call ttm_bo_wait with reservation, cannot be dropped.

So for core ttm and nouveau the fence_lock is never needed, radeon
has
only 1 function that calls ttm_bo_wait which uses a reservation too.
It doesn't need the fence_lock either.
And vmwgfx now also has a syccpu IOCTL (see drm-next).

So assuming that we converted the functions that can be converted to
wait outside of reservation, the same way you have done with Nouveau,
leaving the ones that fall under 1) and 2) above, I would still argue
that a spinlock should be used because taking a reservation may
implicitly mean wait for gpu, and could give bad performance- and
latency charateristics. You shouldn't need to wait for gpu to check
for
buffer idle.
Except that without reservation you can't tell if the buffer is really
idle, or is currently being
used as part of some command submission/eviction before the fence
pointer is set.

Yes, but when that matters, you're either in case 1 or case 2 again.
Otherwise, when you release the reservation, you still don't know.
A typical example of this is the vmwgfx synccpu ioctl, where you can
either choose to block command submission (not used currently)
or not (user-space inter-process synchronization). The former is a case
1 wait and holds reservation while waiting for idle and then ups
cpu_writers. The latter waits without reservation for previously
submitted rendering to finish.
Yeah you could, but what exactly are you waiting on then? If it's some
specific existing rendering,
I would argue that you should create an android userspace fence during
command submission,
or provide your own api to block on a specfic fence in userspace.

If you don't then I think taking a reservation is not unreasonable. In
the most common case the buffer is
idle and not reserved, so it isn't contested. The actual waiting
itself can be done without reservation held,
by taking a reference on the fence.
Yeah, here is where we disagree. I'm afraid people will start getting
sloppy with reservations and use them to protect more stuff, and after a
while they start wondering why the GPU command queue drains...

Perhaps we could agree on a solution (building on one of your original
ideas) where we require reservation to modify the fence pointers, and
the buffer object moving flag, but the structure holding the fence
pointer(s) is RCU safe, so that the pointers can be safely read under an
rcu lock.
I think not modifying the fence pointer without reservation would be safest.
I also don't think readers need the capability to clear sync_obj, this might
simplify the implementation some.

But my preferred option is getting rid of sync_obj completely, and move to
using reservation_object->fence_shared/exclusive, like the incomplete proof
of concept conversion done in nouveau. But then I do need to grab the
reservation lock to touch things, because fences may be set by the i915 driver
I share the reservation_object with.

Alternatively could vmwgfx hold a spinlock when decrementing fence refcount instead?
Then we wouldn't need this in the core, and vmwgfx could use:

spin_lock(&vmw_fence_lock);
fence = ACCESS_ONCE(bo->sync_obj);
if (fence && !kref_get_unless_zero(&fence->ref)) fence = NULL;
spin_unlock(&vmw_fence_lock);

internally in that function, preserving old semantics but without unsetting sync_obj
if no reservation is held. Full rcu might be slightly overkill.

~Maarten

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux