On Fri, Feb 28, 2025 at 3:06 PM Ian Forbes <ian.forbes@xxxxxxxxxxxx> wrote: > > Because sync_files are passive waiters they do not participate in > the processing of fences like the traditional vmw_fence_wait IOCTL. > If userspace exclusively uses sync_files for synchronization then > nothing in the kernel actually processes fence updates as interrupts > for fences are masked and ignored if the kernel does not indicate to the > SVGA device that there are active waiters. > > This oversight results in a bug where the entire GUI can freeze waiting > on a sync_file that will never be signalled as we've masked the interrupts > to signal its completion. This bug is incredibly racy as any process which > interacts with the fencing code via the 3D stack can process the stuck > fences on behalf of the stuck process causing it to run again. Even a > simple app like eglinfo is enough to resume the stuck process. Usually > this bug is seen at a login screen like GDM because there are no other > 3D apps running. > > By adding a seqno waiter we re-enable interrupt based processing of the > dma_fences associated with the sync_file which is signalled as part of a > dma_fence_callback. > > This has likely been broken since it was initially added to the kernel in > 2017 but has gone unnoticed until mutter recently started using sync_files > heavily over the course of 2024 as part of their explicit sync support. > > Fixes: c906965dee22 ("drm/vmwgfx: Add export fence to file descriptor support") > Signed-off-by: Ian Forbes <ian.forbes@xxxxxxxxxxxx> > --- > drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c | 26 +++++++++++++++++++++++++ > 1 file changed, 26 insertions(+) > > diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c > index 2e52d73eba48..ea741bc4ac3f 100644 > --- a/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c > +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_execbuf.c > @@ -4086,6 +4086,23 @@ static int vmw_execbuf_tie_context(struct vmw_private *dev_priv, > return 0; > } > > +/* > + * DMA fence callback to remove a seqno_waiter > + */ > +struct seqno_waiter_rm_context { > + struct dma_fence_cb base; > + struct vmw_private *dev_priv; > +}; > + > +static void seqno_waiter_rm_cb(struct dma_fence *f, struct dma_fence_cb *cb) > +{ > + struct seqno_waiter_rm_context *ctx = > + container_of(cb, struct seqno_waiter_rm_context, base); > + > + vmw_seqno_waiter_remove(ctx->dev_priv); > + kfree(ctx); > +} > + > int vmw_execbuf_process(struct drm_file *file_priv, > struct vmw_private *dev_priv, > void __user *user_commands, void *kernel_commands, > @@ -4266,6 +4283,15 @@ int vmw_execbuf_process(struct drm_file *file_priv, > } else { > /* Link the fence with the FD created earlier */ > fd_install(out_fence_fd, sync_file->file); > + struct seqno_waiter_rm_context *ctx = > + kmalloc(sizeof(*ctx), GFP_KERNEL); > + ctx->dev_priv = dev_priv; > + vmw_seqno_waiter_add(dev_priv); > + if (dma_fence_add_callback(&fence->base, &ctx->base, > + seqno_waiter_rm_cb) < 0) { > + vmw_seqno_waiter_remove(dev_priv); > + kfree(ctx); > + } > } > } Great catch. Thanks! Reviewed-by: Zack Rusin <zack.rusin@xxxxxxxxxxxx> z
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature