On Tue, Sep 26, 2023 at 01:05:49PM -0400, Ray Strode wrote: > From: Ray Strode <rstrode@xxxxxxxxxx> > > A drm atomic commit can be quite slow on some hardware. It can lead > to a lengthy queue of commands that need to get processed and waited > on before control can go back to user space. > > If user space is a real-time thread, that delay can have severe > consequences, leading to the process getting killed for exceeding > rlimits. > > This commit addresses the problem by always running the slow part of > a commit on a workqueue, separated from the task initiating the > commit. > > This change makes the nonblocking and blocking paths work in the same way, > and as a result allows the task to sleep and not use up its > RLIMIT_RTTIME allocation. > > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2861 > Signed-off-by: Ray Strode <rstrode@xxxxxxxxxx> So imo the trouble with this is that we suddenly start to make realtime/cpu usage guarantees in the atomic ioctl. That's a _huge_ uapi change, because even limited to the case of !ALLOW_MODESET we do best effort guarantees at best. And some drivers (again amd's dc) spend a ton of cpu time recomputing state even for pure plane changes without any crtc changes like dpms on/off (at least I remember some bug reports about that). And that state recomputation has to happen synchronously, because it always influences the ioctl errno return value. My take is that you're papering over a performance problem here of the "the driver is too slow/wastes too much cpu time". We should fix the driver, if that's possible. Another option would be if userspace drops realtime priorities for these known-slow operations. And right now _all_ kms operations are potentially cpu and real-time wasters, the entire uapi is best effort. We can also try to change the atomic uapi to give some hard real-time guarantees so that running compositors as SCHED_RT is possible, but that - means a very serious stream of bugs to fix all over - therefore needs some very wide buy-in from drivers that they're willing to make this guarantee - probably needs some really carefully carved out limitations, because there's imo flat-out no way we'll make all atomic ioctl hard time limit bound Also, as König has pointed out, you can roll this duct-tape out in userspace by making the commit non-blocking and immediately waiting for the fences. One thing I didn't see mention is that there's a very subtle uapi difference between non-blocking and blocking: - non-blocking is not allowed to get ahead of the previous commit, and will return EBUSY in that case. See the comment in drm_atomic_helper_commit() - blocking otoh will just block until any previous pending commit has finished Not taking that into account in your patch here breaks uapi because userspace will suddenly get EBUSY when they don't expect that. Cheers, Sima > --- > drivers/gpu/drm/drm_atomic_helper.c | 7 +++---- > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c > index 292e38eb6218..1a1e68d98d38 100644 > --- a/drivers/gpu/drm/drm_atomic_helper.c > +++ b/drivers/gpu/drm/drm_atomic_helper.c > @@ -2028,64 +2028,63 @@ int drm_atomic_helper_commit(struct drm_device *dev, > * This is the point of no return - everything below never fails except > * when the hw goes bonghits. Which means we can commit the new state on > * the software side now. > */ > > ret = drm_atomic_helper_swap_state(state, true); > if (ret) > goto err; > > /* > * Everything below can be run asynchronously without the need to grab > * any modeset locks at all under one condition: It must be guaranteed > * that the asynchronous work has either been cancelled (if the driver > * supports it, which at least requires that the framebuffers get > * cleaned up with drm_atomic_helper_cleanup_planes()) or completed > * before the new state gets committed on the software side with > * drm_atomic_helper_swap_state(). > * > * This scheme allows new atomic state updates to be prepared and > * checked in parallel to the asynchronous completion of the previous > * update. Which is important since compositors need to figure out the > * composition of the next frame right after having submitted the > * current layout. > * > * NOTE: Commit work has multiple phases, first hardware commit, then > * cleanup. We want them to overlap, hence need system_unbound_wq to > * make sure work items don't artificially stall on each another. > */ > > drm_atomic_state_get(state); > - if (nonblock) > - queue_work(system_unbound_wq, &state->commit_work); > - else > - commit_tail(state); > + queue_work(system_unbound_wq, &state->commit_work); > + if (!nonblock) > + flush_work(&state->commit_work); > > return 0; > > err: > drm_atomic_helper_cleanup_planes(dev, state); > return ret; > } > EXPORT_SYMBOL(drm_atomic_helper_commit); > > /** > * DOC: implementing nonblocking commit > * > * Nonblocking atomic commits should use struct &drm_crtc_commit to sequence > * different operations against each another. Locks, especially struct > * &drm_modeset_lock, should not be held in worker threads or any other > * asynchronous context used to commit the hardware state. > * > * drm_atomic_helper_commit() implements the recommended sequence for > * nonblocking commits, using drm_atomic_helper_setup_commit() internally: > * > * 1. Run drm_atomic_helper_prepare_planes(). Since this can fail and we > * need to propagate out of memory/VRAM errors to userspace, it must be called > * synchronously. > * > * 2. Synchronize with any outstanding nonblocking commit worker threads which > * might be affected by the new state update. This is handled by > * drm_atomic_helper_setup_commit(). > * > * Asynchronous workers need to have sufficient parallelism to be able to run > * different atomic commits on different CRTCs in parallel. The simplest way to > -- > 2.41.0 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch