Am 07.08.2018 um 14:59 schrieb Daniel Vetter:
On Tue, Aug 07, 2018 at 02:51:50PM +0200, Christian König wrote:
Am 07.08.2018 um 14:43 schrieb Daniel Vetter:
On Tue, Aug 07, 2018 at 01:23:19PM +0200, Christian König wrote:
Am 07.08.2018 um 13:05 schrieb Chris Wilson:
amdgpu only uses shared-fences internally, but dmabuf importers rely on
implicit write hazard tracking via the reservation_object.fence_excl.
Well I would rather suggest a solution where we stop doing this.
The problem here is that i915 assumes that a write operation always needs
exclusive access to an object protected by an reservation object.
At least for amdgpu, radeon and nouveau this assumption is incorrect, but
only amdgpu has a design where userspace is not lying to the kernel about
it's read/write access.
What we should do instead is to add a flag to each shared fence to note if
it is a write operation or not. Then we can trivially add a function to wait
on on those in i915.
I should have pushed harder for this solution when the problem came up
initially,
For shared buffers in an implicit syncing setup like dri2 the exclusive
fence _is_ your write fence.
And exactly that is absolutely not correct if you ask me.
The exclusive fence is for two use cases, the first one is for drivers which
don't care about concurrent accesses and only use serialized accesses and
the second is for kernel uses under the hood of userspace, evictions, buffer
moves etc..
What i915 does is to abuse that concept for write once read many situations,
and that in turn is the bug we need to fix here.
Again, the exclusive fence was added for implicit sync usage like dri2/3.
_Not_ for your own buffer manager. If you want to separate these two
usages, then I guess we can do that, but claiming that i915 is the odd one
out just aint true. You're arguing against all other kms drivers we have
here.
No I'm not. I discussed exactly that use case with Maarten when the
reservation object was introduced.
I think that Maarten named the explicit fence on purpose "explicit"
fence and not "write" fence to make that distinction clear.
I have to admit that this wasn't really documented, but it indeed was
the original purpose to get away from the idea that writes needs to be
exclusive.
That's how this stuff works. Note it's only
for implicit fencing for dri2 shared buffers. i915 lies as much as
everyone else for explicit syncing.
That is really really ugly and should be fixed instead.
It works and is uapi now. What's the gain in "fixing" it?
It allows you to implement explicit fencing in the uapi without breaking
backward compatibility.
See the situation with amdgpu and intel is the same as with userspace
with mixed implicit and explicit fencing.
If that isn't fixed we will run into the same problem when DRI3 gets
implicit fencing and some applications starts to use it while others
still rely on the explicit fencing.
Regards,
Christian.
And this was
discussed at length when intel and freedreno folks talked about
implementing explicit sync and android sync pts.
Now as long as you only share within amdgpu you can do whatever you want
to, but for anything shared outside of amdgpu, if the buffer isn't shared
through an explicit syncing protocol, then you do have to set the
exclusive fence. That's at least how this stuff works right now with all
other drivers.
For i915 we had to do some uapi trickery to make this work in all cases,
since only really userspace knows whether a write should be a shared or
exclusive write. But that's stricly an issue limited to your driver, and
dosn't need changes to reservation object all throughout the stack.
You don't need to change the reservation object all throughout the stack. A
simple flag if a shared fence is a write or not should be doable.
Give me a day or two to prototype that,
See my other reply, i think that's only needed for amdgpu internal
book-keeping, so that you can create the correct exclusive fence on first
export. But yeah, adding a bitfield to track which fences need to become
exclusive shouldn't be a tricky solution to implement for amdgpu.
-Daniel
Christian.
Aside: If you want to attach multiple write fences to the exclusive fence,
that should be doable with the array fences.
-Daniel
Christian.
For example, the importer use the write hazard for timing a page flip to
only occur after the exporter has finished flushing its write into the
surface. As such, on exporting a dmabuf, we must either flush all
outstanding fences (for we do not know which are writes and should have
been exclusive) or alternatively create a new exclusive fence that is
the composite of all the existing shared fences, and so will only be
signaled when all earlier fences are signaled (ensuring that we can not
be signaled before the completion of any earlier write).
v2: reservation_object is already locked by amdgpu_bo_reserve()
Testcase: igt/amd_prime/amd-to-i915
Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
Cc: Alex Deucher <alexander.deucher@xxxxxxx>
Cc: "Christian König" <christian.koenig@xxxxxxx>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c | 68 ++++++++++++++++++++---
1 file changed, 60 insertions(+), 8 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
index 1c5d97f4b4dd..576a83946c25 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_prime.c
@@ -37,6 +37,7 @@
#include "amdgpu_display.h"
#include <drm/amdgpu_drm.h>
#include <linux/dma-buf.h>
+#include <linux/dma-fence-array.h>
static const struct dma_buf_ops amdgpu_dmabuf_ops;
@@ -188,6 +189,57 @@ amdgpu_gem_prime_import_sg_table(struct drm_device *dev,
return ERR_PTR(ret);
}
+static int
+__reservation_object_make_exclusive(struct reservation_object *obj)
+{
+ struct reservation_object_list *fobj;
+ struct dma_fence_array *array;
+ struct dma_fence **fences;
+ unsigned int count, i;
+
+ fobj = reservation_object_get_list(obj);
+ if (!fobj)
+ return 0;
+
+ count = !!rcu_access_pointer(obj->fence_excl);
+ count += fobj->shared_count;
+
+ fences = kmalloc_array(count, sizeof(*fences), GFP_KERNEL);
+ if (!fences)
+ return -ENOMEM;
+
+ for (i = 0; i < fobj->shared_count; i++) {
+ struct dma_fence *f =
+ rcu_dereference_protected(fobj->shared[i],
+ reservation_object_held(obj));
+
+ fences[i] = dma_fence_get(f);
+ }
+
+ if (rcu_access_pointer(obj->fence_excl)) {
+ struct dma_fence *f =
+ rcu_dereference_protected(obj->fence_excl,
+ reservation_object_held(obj));
+
+ fences[i] = dma_fence_get(f);
+ }
+
+ array = dma_fence_array_create(count, fences,
+ dma_fence_context_alloc(1), 0,
+ false);
+ if (!array)
+ goto err_fences_put;
+
+ reservation_object_add_excl_fence(obj, &array->base);
+ return 0;
+
+err_fences_put:
+ for (i = 0; i < count; i++)
+ dma_fence_put(fences[i]);
+ kfree(fences);
+ return -ENOMEM;
+}
+
/**
* amdgpu_gem_map_attach - &dma_buf_ops.attach implementation
* @dma_buf: shared DMA buffer
@@ -219,16 +271,16 @@ static int amdgpu_gem_map_attach(struct dma_buf *dma_buf,
if (attach->dev->driver != adev->dev->driver) {
/*
- * Wait for all shared fences to complete before we switch to future
- * use of exclusive fence on this prime shared bo.
+ * We only create shared fences for internal use, but importers
+ * of the dmabuf rely on exclusive fences for implicitly
+ * tracking write hazards. As any of the current fences may
+ * correspond to a write, we need to convert all existing
+ * fences on the resevation object into a single exclusive
+ * fence.
*/
- r = reservation_object_wait_timeout_rcu(bo->tbo.resv,
- true, false,
- MAX_SCHEDULE_TIMEOUT);
- if (unlikely(r < 0)) {
- DRM_DEBUG_PRIME("Fence wait failed: %li\n", r);
+ r = __reservation_object_make_exclusive(bo->tbo.resv);
+ if (r)
goto error_unreserve;
- }
}
/* pin buffer into GTT */
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx