Am 09.08.2018 um 16:22 schrieb Daniel Vetter:
On Thu, Aug 9, 2018 at 3:58 PM, Christian König
<ckoenig.leichtzumerken@xxxxxxxxx> wrote:
Am 09.08.2018 um 15:38 schrieb Daniel Vetter:
On Thu, Aug 09, 2018 at 01:37:07PM +0200, Christian König wrote:
[SNIP]
See to me the explicit fence in the reservation object is not even remotely
related to implicit or explicit synchronization.
Hm, I guess that's the confusion then. The only reason we have the
exclusive fence is to implement cross-driver implicit syncing. What
else you do internally in your driver doesn't matter, as long as you
keep up that contract.
And it's intentionally not called write_fence or anything like that,
because that's not what it tracks.
Of course any buffer moves the kernel does also must be tracked in the
exclusive fence, because userspace cannot know about these. So you
might have an exclusive fence set and also an explicit fence passed in
through the atomic ioctl. Aside: Right now all drivers only observe
one or the other, not both, so will break as soon as we start moving
shared buffers around. At least on Android or anything else using
explicit fencing.
Actually both radeon and nouveau use the approach that shared fences
need to wait on as well when they don't come from the current driver.
So here's my summary, as I understanding things right now:
- for non-shared buffers at least, amdgpu uses explicit fencing, and
hence all fences caused by userspace end up as shared fences, whether
that's writes or reads. This means you end up with possibly multiple
write fences, but never any exclusive fences.
- for non-shared buffers the only exclusive fences amdgpu sets are for
buffer moves done by the kernel.
- amgpu (kernel + userspace combo here) does not seem to have a
concept/tracking for when a buffer is used with implicit or explicit
fencing. It does however track all writes.
No, that is incorrect. It tracks all accesses to a buffer object in the
form of shared fences, we don't care if it is a write or not.
What we track as well is which client uses a BO last and as long as the
same client uses the BO we don't add any implicit synchronization.
Only when a BO is used by another client we have implicit
synchronization for all command submissions. This behavior can be
disable with a flag during BO creation.
- as a consequence, amdgpu needs to pessimistically assume that all
writes to shared buffer need to obey implicit fencing rules.
- for shared buffers (across process or drivers) implicit fencing does
_not_ allow concurrent writers. That limitation is why people want to
do explicit fencing, and it's the reason why there's only 1 slot for
an exclusive. Note I really mean concurrent here, a queue of in-flight
writes by different batches is perfectly fine. But it's a fully
ordered queue of writes.
- but as a consequence of amdgpu's lack of implicit fencing and hence
need to pessimistically assume there's multiple write fences amdgpu
needs to put multiple fences behind the single exclusive slot. This is
a limitation imposed by by the amdgpu stack, not something inherit to
how implicit fencing works.
- Chris Wilson's patch implements all this (and afaics with a bit more
coffee, correctly).
If you want to be less pessimistic in amdgpu for shared buffers, you
need to start tracking which shared buffer access need implicit and
which explicit sync. What you can't do is suddenly create more than 1
exclusive fence, that's not how implicit fencing works. Another thing
you cannot do is force everyone else (in non-amdgpu or core code) to
sync against _all_ writes, because that forces implicit syncing. Which
people very much don't want.
I also do see the problem that most other hardware doesn't need that
functionality, because it is driven by a single engine. That's why I
tried to keep the overhead as low as possible.
But at least for amdgpu (and I strongly suspect for nouveau as well) it
is absolutely vital in a number of cases to allow concurrent accesses
from the same client even when the BO is then later used with implicit
synchronization.
This is also the reason why the current workaround is so problematic for
us. Cause as soon as the BO is shared with another (non-amdgpu) device
all command submissions to it will be serialized even when they come
from the same client.
Would it be an option extend the concept of the "owner" of the BO amdpgu
uses to other drivers as well?
When you already have explicit synchronization insider your client, but
not between clients (e.g. still uses DRI2 or DRI3), this could also be
rather beneficial for others as well.
Regards,
Christian.
-Daniel
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel