Re: [PATCH] drm/atomic: Perform blocking commits on workqueue

Daniel Vetter <daniel@xxxxxxxx> · Thu, 5 Oct 2023 11:57:41 +0200

On Tue, Sep 26, 2023 at 01:05:49PM -0400, Ray Strode wrote:
> From: Ray Strode <rstrode@xxxxxxxxxx>
> 
> A drm atomic commit can be quite slow on some hardware. It can lead
> to a lengthy queue of commands that need to get processed and waited
> on before control can go back to user space.
> 
> If user space is a real-time thread, that delay can have severe
> consequences, leading to the process getting killed for exceeding
> rlimits.
> 
> This commit addresses the problem by always running the slow part of
> a commit on a workqueue, separated from the task initiating the
> commit.
> 
> This change makes the nonblocking and blocking paths work in the same way,
> and as a result allows the task to sleep and not use up its
> RLIMIT_RTTIME allocation.
> 
> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2861
> Signed-off-by: Ray Strode <rstrode@xxxxxxxxxx>

So imo the trouble with this is that we suddenly start to make
realtime/cpu usage guarantees in the atomic ioctl. That's a _huge_ uapi
change, because even limited to the case of !ALLOW_MODESET we do best
effort guarantees at best. And some drivers (again amd's dc) spend a ton
of cpu time recomputing state even for pure plane changes without any crtc
changes like dpms on/off (at least I remember some bug reports about
that). And that state recomputation has to happen synchronously, because
it always influences the ioctl errno return value.

My take is that you're papering over a performance problem here of the
"the driver is too slow/wastes too much cpu time". We should fix the
driver, if that's possible.

Another option would be if userspace drops realtime priorities for these
known-slow operations. And right now _all_ kms operations are potentially
cpu and real-time wasters, the entire uapi is best effort.

We can also try to change the atomic uapi to give some hard real-time
guarantees so that running compositors as SCHED_RT is possible, but that
- means a very serious stream of bugs to fix all over
- therefore needs some very wide buy-in from drivers that they're willing
  to make this guarantee
- probably needs some really carefully carved out limitations, because
  there's imo flat-out no way we'll make all atomic ioctl hard time limit
  bound

Also, as König has pointed out, you can roll this duct-tape out in
userspace by making the commit non-blocking and immediately waiting for
the fences.

One thing I didn't see mention is that there's a very subtle uapi
difference between non-blocking and blocking:
- non-blocking is not allowed to get ahead of the previous commit, and
  will return EBUSY in that case. See the comment in
  drm_atomic_helper_commit()
- blocking otoh will just block until any previous pending commit has
  finished

Not taking that into account in your patch here breaks uapi because
userspace will suddenly get EBUSY when they don't expect that.

Cheers, Sima

> ---
>  drivers/gpu/drm/drm_atomic_helper.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
> index 292e38eb6218..1a1e68d98d38 100644
> --- a/drivers/gpu/drm/drm_atomic_helper.c
> +++ b/drivers/gpu/drm/drm_atomic_helper.c
> @@ -2028,64 +2028,63 @@ int drm_atomic_helper_commit(struct drm_device *dev,
>  	 * This is the point of no return - everything below never fails except
>  	 * when the hw goes bonghits. Which means we can commit the new state on
>  	 * the software side now.
>  	 */
>  
>  	ret = drm_atomic_helper_swap_state(state, true);
>  	if (ret)
>  		goto err;
>  
>  	/*
>  	 * Everything below can be run asynchronously without the need to grab
>  	 * any modeset locks at all under one condition: It must be guaranteed
>  	 * that the asynchronous work has either been cancelled (if the driver
>  	 * supports it, which at least requires that the framebuffers get
>  	 * cleaned up with drm_atomic_helper_cleanup_planes()) or completed
>  	 * before the new state gets committed on the software side with
>  	 * drm_atomic_helper_swap_state().
>  	 *
>  	 * This scheme allows new atomic state updates to be prepared and
>  	 * checked in parallel to the asynchronous completion of the previous
>  	 * update. Which is important since compositors need to figure out the
>  	 * composition of the next frame right after having submitted the
>  	 * current layout.
>  	 *
>  	 * NOTE: Commit work has multiple phases, first hardware commit, then
>  	 * cleanup. We want them to overlap, hence need system_unbound_wq to
>  	 * make sure work items don't artificially stall on each another.
>  	 */
>  
>  	drm_atomic_state_get(state);
> -	if (nonblock)
> -		queue_work(system_unbound_wq, &state->commit_work);
> -	else
> -		commit_tail(state);
> +	queue_work(system_unbound_wq, &state->commit_work);
> +	if (!nonblock)
> +		flush_work(&state->commit_work);
>  
>  	return 0;
>  
>  err:
>  	drm_atomic_helper_cleanup_planes(dev, state);
>  	return ret;
>  }
>  EXPORT_SYMBOL(drm_atomic_helper_commit);
>  
>  /**
>   * DOC: implementing nonblocking commit
>   *
>   * Nonblocking atomic commits should use struct &drm_crtc_commit to sequence
>   * different operations against each another. Locks, especially struct
>   * &drm_modeset_lock, should not be held in worker threads or any other
>   * asynchronous context used to commit the hardware state.
>   *
>   * drm_atomic_helper_commit() implements the recommended sequence for
>   * nonblocking commits, using drm_atomic_helper_setup_commit() internally:
>   *
>   * 1. Run drm_atomic_helper_prepare_planes(). Since this can fail and we
>   * need to propagate out of memory/VRAM errors to userspace, it must be called
>   * synchronously.
>   *
>   * 2. Synchronize with any outstanding nonblocking commit worker threads which
>   * might be affected by the new state update. This is handled by
>   * drm_atomic_helper_setup_commit().
>   *
>   * Asynchronous workers need to have sufficient parallelism to be able to run
>   * different atomic commits on different CRTCs in parallel. The simplest way to
> -- 
> 2.41.0
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch