I was thinking about this more recently. I was initially considering "maybe this should be a per-BO import," but I couldn't think of anything in the GL model that would actually benefit given its not "true" bindless and there's no update-after-bind there. Worth others more familiar with GL asking that question to themselves also. I am definitely not totally up on what's possible there. Overall, I think I am OK with this approach, even though I think mixing implicit and explicit sync is gross, and I want the pain that is implicit sync to just go away forever. :-) - Joshie 🐸✨ On August 7, 2024 4:39:32 PM GMT+01:00, Faith Ekstrand <faith@xxxxxxxxxxxxx> wrote: >Previously, AMDGPU_GEM_CREATE_EXPLICIT_SYNC was used to disable implicit >synchronization on BOs when explicit synchronization can be used. The >problem is that this flag is per-BO and affects all amdgpu users in the >system, not just the usermode drver which sets it. This can lead to >some unintended consequences for userspace if not used carefully. > >Since the introduction of DMA_BUF_IOCTL_EXPORT_SYNC_FILE and >DMA_BUF_IOCTL_IMPORT_SYNC_FILE, many userspace window system components >have grown the ability to convert between the Vulkan explicit sync model >and the legacy implicit sync model used by X11 and Wayland in the past. >This allows both old and new components to exist simultaneously and talk >to each other. In particular, XWayland is able to convert between the >two to let Vulkan apps work seamlessly with older X11 compositors that >aren't aware of explicit synchronizaiton. This is rapidly becoming the >backbone of synchronization in the Linux window system space. > >Unfortunately, AMDGPU_GEM_CREATE_EXPLICIT_SYNC breaks this because it >disables all implicit synchronization on the given BO, regardless of >which userspace driver is rendering with it and regardless of the >assumptions made by the client application. In particular, this is >causing issues with RADV and radeonsi. RADV sets the flag to disable >implicit sync on window system buffers so that it can safely have them >resident at all times without causing internal over-synchronization. >The BO is then handed off to a compositor which composites using >radeonsi. If the compositor uses the EGL_ANDROID_native_fence_sync >extension to pass explicit sync files through to radeonsi, then >everything is fine. However, if that buffer ever gets handed to an >instance of radeonsi with any assumption of implicit synchronization, >radeonsi won't be able sync on the BO because RADV disabled implict sync >on that BO system-wide. It doesn't matter whether some window system >component used DMA_BUF_IOCTL_IMPORT_SYNC_FILE to set the appropriate >fence on the BO, amdgpu will ignore it. > >This new flag disables implicit sync on the context rather than the BO. >This way RADV can disable implicit sync (which is what RADV has always >wanted) without affecting other components in the system. If RADV (or >some other driver) wants implicit sync on some BO, it can use >DMA_BUF_IOCTL_EXPORT_SYNC_FILE and DMA_BUF_IOCTL_IMPORT_SYNC_FILE to >manually synchronize with other implicit-sync components. This is the >approach we've taken with NVK/nouveau, ANV/xe, and similar to the >approach taken by ANV/i915 and it works well for those drivers. > >Ideally, I would like to see something like this back-ported to at least >the kernel where DMA_BUF_IOCTL_IMPORT/EXPORT_SYNC_FILE were introduced >so that we don't have to wait another year for the fix to reach users. >However, I understand that back-porting UAPI is problematic and I'll >leave that decision up to the amdgpu maintainers. Michel suggested that >a new CTX_OP would make more sense if we want to back-port it but the >create flag made more sense to me from an API design PoV. > >Signed-off-by: Faith Ekstrand <faith.ekstrand@xxxxxxxxxxxxx> >Cc: Alex Deucher <alexander.deucher@xxxxxxx> >Cc: Christian König <christian.koenig@xxxxxxx> >Cc: David Airlie <airlied@xxxxxxxxx> >Cc: Michel Dänzer <mdaenzer@xxxxxxxxxx> >Cc: Bas Nieuwenhuizen <bas@xxxxxxxxxxxxxxxxxxx> >--- > drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 3 ++- > drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 12 ++++++++---- > drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h | 7 +++++++ > include/uapi/drm/amdgpu_drm.h | 12 +++++++++++- > 4 files changed, 28 insertions(+), 6 deletions(-) > >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >index ec888fc6ead8..8410b4426541 100644 >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c >@@ -1196,7 +1196,8 @@ static int amdgpu_cs_sync_rings(struct amdgpu_cs_parser *p) > struct dma_resv *resv = bo->tbo.base.resv; > enum amdgpu_sync_mode sync_mode; > >- sync_mode = amdgpu_bo_explicit_sync(bo) ? >+ sync_mode = (amdgpu_ctx_explicit_sync(p->ctx) || >+ amdgpu_bo_explicit_sync(bo)) ? > AMDGPU_SYNC_EXPLICIT : AMDGPU_SYNC_NE_OWNER; > r = amdgpu_sync_resv(p->adev, &p->sync, resv, sync_mode, > &fpriv->vm); >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >index 5cb33ac99f70..a304740ccedf 100644 >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c >@@ -318,7 +318,8 @@ static int amdgpu_ctx_get_stable_pstate(struct amdgpu_ctx *ctx, > } > > static int amdgpu_ctx_init(struct amdgpu_ctx_mgr *mgr, int32_t priority, >- struct drm_file *filp, struct amdgpu_ctx *ctx) >+ uint32_t flags, struct drm_file *filp, >+ struct amdgpu_ctx *ctx) > { > struct amdgpu_fpriv *fpriv = filp->driver_priv; > u32 current_stable_pstate; >@@ -334,6 +335,7 @@ static int amdgpu_ctx_init(struct amdgpu_ctx_mgr *mgr, int32_t priority, > ctx->mgr = mgr; > spin_lock_init(&ctx->ring_lock); > >+ ctx->flags = flags; > ctx->reset_counter = atomic_read(&mgr->adev->gpu_reset_counter); > ctx->reset_counter_query = ctx->reset_counter; > ctx->generation = amdgpu_vm_generation(mgr->adev, &fpriv->vm); >@@ -474,6 +476,7 @@ static int amdgpu_ctx_alloc(struct amdgpu_device *adev, > struct amdgpu_fpriv *fpriv, > struct drm_file *filp, > int32_t priority, >+ uint32_t flags, > uint32_t *id) > { > struct amdgpu_ctx_mgr *mgr = &fpriv->ctx_mgr; >@@ -493,7 +496,7 @@ static int amdgpu_ctx_alloc(struct amdgpu_device *adev, > } > > *id = (uint32_t)r; >- r = amdgpu_ctx_init(mgr, priority, filp, ctx); >+ r = amdgpu_ctx_init(mgr, priority, flags, filp, ctx); > if (r) { > idr_remove(&mgr->ctx_handles, *id); > *id = 0; >@@ -666,7 +669,7 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void *data, > struct drm_file *filp) > { > int r; >- uint32_t id, stable_pstate; >+ uint32_t id, stable_pstate, flags; > int32_t priority; > > union drm_amdgpu_ctx *args = data; >@@ -675,6 +678,7 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void *data, > > id = args->in.ctx_id; > priority = args->in.priority; >+ flags = args->in.flags; > > /* For backwards compatibility, we need to accept ioctls with garbage > * in the priority field. Garbage values in the priority field, result >@@ -685,7 +689,7 @@ int amdgpu_ctx_ioctl(struct drm_device *dev, void *data, > > switch (args->in.op) { > case AMDGPU_CTX_OP_ALLOC_CTX: >- r = amdgpu_ctx_alloc(adev, fpriv, filp, priority, &id); >+ r = amdgpu_ctx_alloc(adev, fpriv, filp, priority, flags, &id); > args->out.alloc.ctx_id = id; > break; > case AMDGPU_CTX_OP_FREE_CTX: >diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h >index 85376baaa92f..9431c8d2ea59 100644 >--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h >+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.h >@@ -45,6 +45,7 @@ struct amdgpu_ctx_entity { > struct amdgpu_ctx { > struct kref refcount; > struct amdgpu_ctx_mgr *mgr; >+ uint32_t flags; > unsigned reset_counter; > unsigned reset_counter_query; > uint64_t generation; >@@ -84,6 +85,12 @@ struct dma_fence *amdgpu_ctx_get_fence(struct amdgpu_ctx *ctx, > bool amdgpu_ctx_priority_is_valid(int32_t ctx_prio); > void amdgpu_ctx_priority_override(struct amdgpu_ctx *ctx, int32_t ctx_prio); > >+static inline bool >+amdgpu_ctx_explicit_sync(struct amdgpu_ctx *ctx) >+{ >+ return ctx->flags & AMDGPU_CTX_ALLOC_FLAGS_EXPLICIT_SYNC; >+} >+ > int amdgpu_ctx_ioctl(struct drm_device *dev, void *data, > struct drm_file *filp); > >diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h >index 96e32dafd4f0..e9d87a6e3d86 100644 >--- a/include/uapi/drm/amdgpu_drm.h >+++ b/include/uapi/drm/amdgpu_drm.h >@@ -125,7 +125,14 @@ extern "C" { > #define AMDGPU_GEM_CREATE_VRAM_CONTIGUOUS (1 << 5) > /* Flag that BO is always valid in this VM */ > #define AMDGPU_GEM_CREATE_VM_ALWAYS_VALID (1 << 6) >-/* Flag that BO sharing will be explicitly synchronized */ >+/* Flag that BO sharing will be explicitly synchronized >+ * >+ * This flag should not be used unless the client can guarantee that no >+ * other driver which ever touches this BO will ever want to use implicit >+ * synchronization as it disables implicit sync on this BO system-wide. >+ * Instead, drivers which use an explicit synchronization model should >+ * prefer AMDGPU_CTX_ALLOC_FLAGS_EXPLICIT_SYNC. >+ */ > #define AMDGPU_GEM_CREATE_EXPLICIT_SYNC (1 << 7) > /* Flag that indicates allocating MQD gart on GFX9, where the mtype > * for the second page onward should be set to NC. It should never >@@ -240,6 +247,9 @@ union drm_amdgpu_bo_list { > #define AMDGPU_CTX_OP_GET_STABLE_PSTATE 5 > #define AMDGPU_CTX_OP_SET_STABLE_PSTATE 6 > >+/* indicate that all synchronization will be explicit */ >+#define AMDGPU_CTX_ALLOC_FLAGS_EXPLICIT_SYNC (1<<0) >+ > /* GPU reset status */ > #define AMDGPU_CTX_NO_RESET 0 > /* this the context caused it */