Re: Correct sequencing of usage of DRM writeback connector

Daniel Vetter <daniel@xxxxxxxx> · Fri, 21 Jun 2024 19:18:34 +0200

On Tue, Jun 18, 2024 at 12:48:13PM +0300, Dmitry Baryshkov wrote:
> On Tue, 18 Jun 2024 at 12:33, Daniel Vetter <daniel@xxxxxxxx> wrote:
> >
> > On Mon, Jun 17, 2024 at 10:52:27PM +0300, Dmitry Baryshkov wrote:
> > > On Mon, Jun 17, 2024 at 11:28:35AM GMT, Abhinav Kumar wrote:
> > > > Hi
> > > >
> > > > On 6/17/2024 9:54 AM, Brian Starkey wrote:
> > > > > Hi,
> > > > >
> > > > > On Mon, Jun 17, 2024 at 05:16:36PM +0200, Daniel Vetter wrote:
> > > > > > On Mon, Jun 17, 2024 at 01:41:59PM +0000, Hoosier, Matt wrote:
> 
> > > > > > If/when we have hardware and driver support where you can use the
> > > > > > writeback connector as a real-time streamout kind of thing, then we need
> > > > > > to change all this, because with the current implementation, there's
> > > > > > indeed the possibility that funny things can happen if you ignore the
> > > > > > notice (funny as in data corruption, not funny as the kernel crashes of
> > > > > > course).
> > > > >
> > > > > Indeed, the wording was added (from what I remember from so long
> > > > > ago...) because it sounded like different HW made very different
> > > > > guarantees/non-guarantees about what data would be written when, so
> > > > > perhaps you'd end up with some pixels from the next frame in your
> > > > > buffer or something.
> > > > >
> > > > > Taking Mali-DP/Komeda again, the writeback configuration is latched
> > > > > along with everything else, and writeback throughput permitting, it
> > > > > should "just work" if you submit a new writeback every frame. It
> > > > > drains out the last of the data during vblank, before starting on the
> > > > > next frame. That doesn't help the "general case" though.
> > > > >
> > > >
> > > > Would it be fair to summarize it like below:
> > > >
> > > > 1) If the same CRTC is shared with the real time display, then the hardware
> > > > is expected to fire this every frame so userspace should wait till this is
> > > > signaled.
> > >
> > > As I wrote in response to another email in this thread, IMO existing
> > > uAPI doesn't fully allow this. There is no way to enforce 'vblank'
> > > handling onto the userspace. So userspace should be able to supply at
> > > least two buffers and then after the vblank it should be able to enqueue
> > > the next buffer, while the filled buffer is automatically dequeued by
> > > the driver and is not used for further image output.
> >
> > Yeah if you want streaming writeback we need a queue depth of at least 2
> > in the kms api. Will help a lot on all hardware, but on some it's required
> > because the time when the writeback buffer is fully flushed is after the
> > point of no return for the next frame (which is when the vblank event is
> > supposed to go out).
> >
> > I think over the years we've slowly inched forward to make at least the
> > drm code safe for a queue depth of 2 in the atomic machinery, but the
> > writeback and driver code probably needs a bunch of work.
> 
> Do you mean handling the queue by allowing userspace to commit 'next' FB_ID?
> 
> I was leaning towards extending the uAPI with something like explicit
> WRITEBACK_FB_ID_QUEUED and WRITEBACK_OUT_FENCE_PTR_QUEUED properties.
> This way once the fence has been reached, the drm_writeback might
> automatically put the old framebuffer, move _QUEUED to normal props
> and then signal the userspace. This way the single-frame writeback
> drivers can support the old API, while allowing cloned-writeback
> drivers to implement the streaming approach. Also, this allows drivers
> to do clever tricks, like forbidding the _QUEUED operation if the
> refresh rate for the writeback connector is too high.

Eh I think we should just allow atomic commits with a queue depth > 1. I
think that's both the cleanest uapi, and also the cleanest on the driver
side, since I just don't want to think about what happens when we have
multiple commits going on at the same time on the same crtc.

On the core/helper side we've tried to get there slowly by explicitly
accessing old/new state and never accessing plane/crtc->state directly.
That's really all that should be needed.

I guess to make things easier for drivers we could do an intermediate uapi
where we allow queue depth, but only for commits that don't require a
modeset, since those tend to be much simpler. Or maybe even allow the
driver to fully control where it can handle a queue depth > 1.

Essentially roughly this, minus all the safety checks we'll probably need
to add in various places. Note: extremely incomplete :-)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c b/drivers/gpu/drm/drm_atomic_helper.c
index 63ee81e478b9..31c4e124eb5a 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -2136,7 +2136,7 @@ static int stall_checks(struct drm_crtc *crtc, bool nonblock)
 	spin_lock(&crtc->commit_lock);
 	i = 0;
 	list_for_each_entry(commit, &crtc->commit_list, commit_entry) {
-		if (i == 0) {
+		if (i == 1) {
 			completed = try_wait_for_completion(&commit->flip_done);
 			/*
 			 * Userspace is not allowed to get ahead of the previous
@@ -2150,7 +2150,7 @@ static int stall_checks(struct drm_crtc *crtc, bool nonblock)
 
 				return -EBUSY;
 			}
-		} else if (i == 1) {
+		} else if (i == 2) {
 			stall_commit = drm_crtc_commit_get(commit);
 			break;
 		}
@@ -3015,7 +3015,7 @@ int drm_atomic_helper_swap_state(struct drm_atomic_state *state,
 	struct drm_private_obj *obj;
 	struct drm_private_state *old_obj_state, *new_obj_state;
 
-	if (stall) {
+	if (stall && 0) {
 		/*
 		 * We have to stall for hw_done here before
 		 * drm_atomic_helper_wait_for_dependencies() because flip

But I hope it sketches the idea at least.

Cheers, Sima

> 
> 
> > -Sima
> >
> > >
> > > >
> > > > 2) If a different CRTC is used for the writeback, then the composition loop
> > > > for the real time display should not block on this unless its a mirroring
> > > > use-case, then we will be throttled by the lowest refresh rate anyway.
> > >
> > > what is mirroring in this case? You have specified that a different CRTC
> > > is being used.
> > >
> > > >
> > > > > >
> > > > > > If we already have devices where you can use writeback together with real
> > > > > > outputs, then I guess that counts as an oopsie :-/
> > > > >
> > > > > Well "works fine" fits into the "undefined behaviour" bucket, just as
> > > > > well as "corrupts your fb" does :-)
> 
> 
> -- 
> With best wishes
> Dmitry

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch