RE: Correct sequencing of usage of DRM writeback connector

"Hoosier, Matt" <Matt.Hoosier@xxxxxxxxxx> · Tue, 18 Jun 2024 13:57:39 +0000

> From: Brian Starkey <brian.starkey@xxxxxxx> 
> Sent: Tuesday, June 18, 2024 4:11 AM
> To: Dmitry Baryshkov <dmitry.baryshkov@xxxxxxxxxx>
> Cc: Hoosier, Matt <Matt.Hoosier@xxxxxxxxxx>; Daniel Vetter <daniel@xxxxxxxx>; dri-devel@xxxxxxxxxxxxxxxxxxxxx; Pekka Paalanen <pekka.paalanen@xxxxxxxxxxxxx>; nd@xxxxxxx
> Subject: Re: Correct sequencing of usage of DRM writeback connector
> 
> Hi,
> 
> On Mon, Jun 17, 2024 at 10:48:47PM UTC, Dmitry Baryshkov wrote:
> > On Mon, Jun 17, 2024 at 05:36:34PM GMT, Hoosier, Matt wrote:
> > > >> >> >> There is a discussion ongoing over in the compositor world about the implication 
> > > >> Hi,
> > > >>
> > > >> On Mon, Jun 17, 2024 at 05:16:36PM +0200, Daniel Vetter wrote:
> > > >> >On Mon, Jun 17, 2024 at 01:41:59PM +0000, Hoosier, Matt wrote:
> > > >> >> Hi,
> > > >> >>
> > > >> >> There is a discussion ongoing over in the compositor world about the implication of this cautionary wording found in the documentation for the DRM_MODE_CONNECTOR_WRITEBACK connectors:
> > > >> >>
> > > >> >> >  *  "WRITEBACK_OUT_FENCE_PTR":
> > > >> >> >  *	Userspace can use this property to provide a pointer for the kernel to
> > > >> >> >  *	fill with a sync_file file descriptor, which will signal once the
> > > >> >> >  *	writeback is finished. The value should be the address of a 32-bit
> > > >> >> >  *	signed integer, cast to a u64.
> > > >> >> >  *	Userspace should wait for this fence to signal before making another
> > > >> >> >  *	commit affecting any of the same CRTCs, Planes or Connectors.
> > > >> >> >  *	**Failure to do so will result in undefined behaviour.**
> > > >> >> >  *	For this reason it is strongly recommended that all userspace
> > > >> >> >  *	applications making use of writeback connectors *always* retrieve an
> > > >> >> >  *	out-fence for the commit and use it appropriately.
> > > >> >> >  *	From userspace, this property will always read as zero.
> > > >> >>
> > > >> >> The question is whether it's realistic to hope that a DRM writeback
> > > >> >> connector can produce results on every frame, and do so without dragging
> > > >> >> down the frame-rate for the connector.
> > > >> >>
> > > >> >> The wording in the documentation above suggests that it is very likely
> > > >> >> the fence fd won't signal userspace until after the vblank following the
> > > >> >> scanout during which the writeback was applied (call that frame N). This
> > > >> >> would mean that the compositor driving the connector would typically be
> > > >> >> unable to legally queue a page flip for frame N+1.
> > > >> >>
> > > >> >> Is this the right interpretation? Is the writeback hardware typically
> > > >> >> even designed with a streaming use-case in mind? Maybe it's just
> > > >> >> intended for occasional static screenshots.
> > > >> >
> > > >> >So typically writeback hardware needs its separate crtc (at least the
> > > >> >examples I know of) and doesn't make a lot of guarantees that it's fast
> > > >> >enough for real time use. Since it's a separate crtc it shouldn't hold up
> > > >> >the main composition loop, and so this should be all fine.
> > > 
> > > Hmm, I don't think this matches the implementation. What I see -- sometimes people call this "concurrent writeback" -- is that the writeback connector is wired directly to the CRTC that's feeding the regular connector whose picture is getting captured.
> > > 
> > > Something like (for frame N):
> > > 
> > > * DP-1
> > >   * CRTC_ID = crtc-0
> > > * plane-0
> > >   * CRTC_ID = crtc-0
> > >   * FB_ID = fb-0
> > > 
> > > * Writeback-1
> > >   * CRTC_ID = crtc-0
> > >   * WRITEBACK_FB_ID = fb-1
> > >   * WRITEBACK_OUT_FENCE_PTR = <whatever>
> > > 
> > > 
> > > Are you saying that that for frame N+1, we should switch DP-1 to use a different CRTC while the writeback is still busy being retired into fb-1 through crtc-0?
> 
> The "letter" of the API is that you need to wait for the Writeback-1
> fence before making another commit to any of the resources - which
> effectively means halving your update rate for crtc-0.
> 
> The fence might fire before vsync, so you might have a very small
> window for getting a new commit in on the very next frame, but
> realistically it's unlikely.
> 
> Logically, if the writeback is working line-by-line alongside scanout
> then it won't have the last pixels until the end of the last line, so
> it can't signal completion until at least the start of vblank.
> 
> > > 
> > > >>
> > > >> On Mali-DP and Komeda at least, you can use writeback on the same CRTC
> > > >> that is driving a "real" display, and it should generally work. If the
> > > >> writeback doesn't keep up then the HW will signal an error, but it was
> > > >> designed to work in-sync with real scanout, on the same pipe.
> > > >>
> > > >> >
> > > >> >If/when we have hardware and driver support where you can use the
> > > >> >writeback connector as a real-time streamout kind of thing, then we need
> > > >> >to change all this, because with the current implementation, there's
> > > >> >indeed the possibility that funny things can happen if you ignore the
> > > >> >notice (funny as in data corruption, not funny as the kernel crashes of
> > > >> >course).
> > > >>
> > > >> Indeed, the wording was added (from what I remember from so long
> > > >> ago...) because it sounded like different HW made very different
> > > >> guarantees/non-guarantees about what data would be written when, so
> > > >> perhaps you'd end up with some pixels from the next frame in your
> > > >> buffer or something.
> > > >>
> > > >> Taking Mali-DP/Komeda again, the writeback configuration is latched
> > > >> along with everything else, and writeback throughput permitting, it
> > > >> should "just work" if you submit a new writeback every frame. It
> > > >> drains out the last of the data during vblank, before starting on the
> > > >> next frame. That doesn't help the "general case" though.
> > > 
> > > Are you saying that on hardware whose writeback implementation is
> > > amenable, the drivers will generally fire the fence FD in time for
> > > userspace to post a next frame to the real connector's CRTC for the
> > > immediately following frame? Or are you just saying that some hardware
> > > could support it, but that the DRM framework's insistence that
> > > userspace waits until the fence fires might still make it artificially
> > > too slow to drive the hardware to its full capacity?
> > 
> 
> I'm saying that on Mali-DP and Komeda, you can submit another frame
> *before* the Writeback-1 fence fires. On that HW, you should be able
> to submit a frame including a new writeback FB, at full refresh rate,
> and it should "just work."
> 
> The writeback code was written to permit this, but it's not
> "discoverable" via the API whether it's OK on a particular device; and
> the reason for the defensive wording is that some people said it
> *wasn't* OK on theirs - I seem to remember VC4 writeback makes very
> few timing guarantees.
> 
> Maybe it should be discoverable. I don't know how hard it would be to
> describe everyone's constraints.
> 
> > Current DRM uAPI for writeback supports single-frame jobs. Userspace
> > submits an FB, it waits for the fence, then submits next FB, etc. This
> > is not well-suitable (nor it is defined in a way suitable) for the
> > streamed operations, where you want to generate a stream of N frames per
> > second. Even though userspace sets a mode which includes a particular
> > frame rate, the WB API has no guarantee that the FPS will be or can be
> > met.
> > 
> > In my opinion a proper uAPI for the streamed Writeback should be closer
> > to v4l2, where you submit several buffers in advance together with the
> > corresponding fences and then they are filled one by one and the
> > corresponding fence is signalled to userspace. 
> > 
> 
> At the time of introducing writeback connectors, we also discussed
> having a more "streaming" API, or even having an actual V4L2 device
> representing the frame stream. At the time we were most interested in
> single frames for static-screen and testing purposes, and there wasn't
> a consensus on how to make a streaming API, so we didn't do it.
> 
> > Note, in my humble opinion, it should be perfectly possible to setup
> > writeback as a clone to the existing connector (if a clone mode is
> > suppored by the hardware) and then to supply the jobs (FB+fence)
> > occasionally rather than for each frame. Such userspace operation should
> > not cause any degradation on the main connector.
> 
> This is fine and supported by the API. If the HW supports it, you can
> have a CRTC scanning out to a real display, and then every Nth frame
> add a writeback framebuffer to capture that frame.
> 
> IIRC you can attach a Writeback connector without a framebuffer to a
> CRTC, to support exactly this use case.
> 
> So on Frame 0:
> 
> * DP-1
>   * CRTC_ID = crtc-0
> * plane-0
>   * CRTC_ID = crtc-0
>   * FB_ID = fb-0
> 
> * Writeback-1
>   * CRTC_ID = crtc-0
>   * WRITEBACK_FB_ID = 0 (no framebuffer)
>   * WRITEBACK_OUT_FENCE_PTR = NULL
> 
> Then on Frame 'N' where you want a writeback, and a non-modeset
> commit, simply add a framebuffer:
> 
> * DP-1
>   * CRTC_ID = crtc-0
> * plane-0
>   * CRTC_ID = crtc-0
>   * FB_ID = fb-1
> 
> * Writeback-1
>   * CRTC_ID = crtc-0
>   * WRITEBACK_FB_ID = fb-9
>   * WRITEBACK_OUT_FENCE_PTR = <whatever>
> 
> Frame 'N+1' with no writeback, don't add a framebuffer:
> 
> * DP-1
>   * CRTC_ID = crtc-0
> * plane-0
>   * CRTC_ID = crtc-0
>   * FB_ID = fb-2
> 
> * Writeback-1
>   * CRTC_ID = crtc-0
>   * WRITEBACK_FB_ID = 0
>   * WRITEBACK_OUT_FENCE_PTR = NULL
> 
> > 
> > > 
> > > I want to be a good citizen and do stuff by the book here. :-P
> > > 
> 
> Yeah... so as you pointed out initially, if you've got a writeback
> connector attached to your scanout CRTC, "by the book" means basically
> reducing update rate. Obviously if you use an entirely separate CRTC
> for writeback, they operate independently.

Okay, got it.

Thanks to everybody for the help interpreting the right way to use this API for now.

> 
> Thanks,
> -Brian
> 
> > > >>
> > > >> >
> > > >> >If we already have devices where you can use writeback together with real
> > > >> >outputs, then I guess that counts as an oopsie :-/
> > > >>
> > > >> Well "works fine" fits into the "undefined behaviour" bucket, just as
> > > >> well as "corrupts your fb" does :-)
> > > >>
> > > >> -Brian
> > > 
> > 
> > -- 
> > With best wishes
> > Dmitry
>