Re: Correct sequencing of usage of DRM writeback connector

Brian Starkey <brian.starkey@xxxxxxx> · Tue, 18 Jun 2024 10:10:40 +0100

Hi,

On Mon, Jun 17, 2024 at 10:48:47PM UTC, Dmitry Baryshkov wrote:
> On Mon, Jun 17, 2024 at 05:36:34PM GMT, Hoosier, Matt wrote:
> > >> >> >> There is a discussion ongoing over in the compositor world about the implication 
> > >> Hi,
> > >>
> > >> On Mon, Jun 17, 2024 at 05:16:36PM +0200, Daniel Vetter wrote:
> > >> >On Mon, Jun 17, 2024 at 01:41:59PM +0000, Hoosier, Matt wrote:
> > >> >> Hi,
> > >> >>
> > >> >> There is a discussion ongoing over in the compositor world about the implication of this cautionary wording found in the documentation for the DRM_MODE_CONNECTOR_WRITEBACK connectors:
> > >> >>
> > >> >> >  *  "WRITEBACK_OUT_FENCE_PTR":
> > >> >> >  *	Userspace can use this property to provide a pointer for the kernel to
> > >> >> >  *	fill with a sync_file file descriptor, which will signal once the
> > >> >> >  *	writeback is finished. The value should be the address of a 32-bit
> > >> >> >  *	signed integer, cast to a u64.
> > >> >> >  *	Userspace should wait for this fence to signal before making another
> > >> >> >  *	commit affecting any of the same CRTCs, Planes or Connectors.
> > >> >> >  *	**Failure to do so will result in undefined behaviour.**
> > >> >> >  *	For this reason it is strongly recommended that all userspace
> > >> >> >  *	applications making use of writeback connectors *always* retrieve an
> > >> >> >  *	out-fence for the commit and use it appropriately.
> > >> >> >  *	From userspace, this property will always read as zero.
> > >> >>
> > >> >> The question is whether it's realistic to hope that a DRM writeback
> > >> >> connector can produce results on every frame, and do so without dragging
> > >> >> down the frame-rate for the connector.
> > >> >>
> > >> >> The wording in the documentation above suggests that it is very likely
> > >> >> the fence fd won't signal userspace until after the vblank following the
> > >> >> scanout during which the writeback was applied (call that frame N). This
> > >> >> would mean that the compositor driving the connector would typically be
> > >> >> unable to legally queue a page flip for frame N+1.
> > >> >>
> > >> >> Is this the right interpretation? Is the writeback hardware typically
> > >> >> even designed with a streaming use-case in mind? Maybe it's just
> > >> >> intended for occasional static screenshots.
> > >> >
> > >> >So typically writeback hardware needs its separate crtc (at least the
> > >> >examples I know of) and doesn't make a lot of guarantees that it's fast
> > >> >enough for real time use. Since it's a separate crtc it shouldn't hold up
> > >> >the main composition loop, and so this should be all fine.
> > 
> > Hmm, I don't think this matches the implementation. What I see -- sometimes people call this "concurrent writeback" -- is that the writeback connector is wired directly to the CRTC that's feeding the regular connector whose picture is getting captured.
> > 
> > Something like (for frame N):
> > 
> > * DP-1
> >   * CRTC_ID = crtc-0
> > * plane-0
> >   * CRTC_ID = crtc-0
> >   * FB_ID = fb-0
> > 
> > * Writeback-1
> >   * CRTC_ID = crtc-0
> >   * WRITEBACK_FB_ID = fb-1
> >   * WRITEBACK_OUT_FENCE_PTR = <whatever>
> > 
> > 
> > Are you saying that that for frame N+1, we should switch DP-1 to use a different CRTC while the writeback is still busy being retired into fb-1 through crtc-0?

The "letter" of the API is that you need to wait for the Writeback-1
fence before making another commit to any of the resources - which
effectively means halving your update rate for crtc-0.

The fence might fire before vsync, so you might have a very small
window for getting a new commit in on the very next frame, but
realistically it's unlikely.

Logically, if the writeback is working line-by-line alongside scanout
then it won't have the last pixels until the end of the last line, so
it can't signal completion until at least the start of vblank.

> > 
> > >>
> > >> On Mali-DP and Komeda at least, you can use writeback on the same CRTC
> > >> that is driving a "real" display, and it should generally work. If the
> > >> writeback doesn't keep up then the HW will signal an error, but it was
> > >> designed to work in-sync with real scanout, on the same pipe.
> > >>
> > >> >
> > >> >If/when we have hardware and driver support where you can use the
> > >> >writeback connector as a real-time streamout kind of thing, then we need
> > >> >to change all this, because with the current implementation, there's
> > >> >indeed the possibility that funny things can happen if you ignore the
> > >> >notice (funny as in data corruption, not funny as the kernel crashes of
> > >> >course).
> > >>
> > >> Indeed, the wording was added (from what I remember from so long
> > >> ago...) because it sounded like different HW made very different
> > >> guarantees/non-guarantees about what data would be written when, so
> > >> perhaps you'd end up with some pixels from the next frame in your
> > >> buffer or something.
> > >>
> > >> Taking Mali-DP/Komeda again, the writeback configuration is latched
> > >> along with everything else, and writeback throughput permitting, it
> > >> should "just work" if you submit a new writeback every frame. It
> > >> drains out the last of the data during vblank, before starting on the
> > >> next frame. That doesn't help the "general case" though.
> > 
> > Are you saying that on hardware whose writeback implementation is
> > amenable, the drivers will generally fire the fence FD in time for
> > userspace to post a next frame to the real connector's CRTC for the
> > immediately following frame? Or are you just saying that some hardware
> > could support it, but that the DRM framework's insistence that
> > userspace waits until the fence fires might still make it artificially
> > too slow to drive the hardware to its full capacity?
> 

I'm saying that on Mali-DP and Komeda, you can submit another frame
*before* the Writeback-1 fence fires. On that HW, you should be able
to submit a frame including a new writeback FB, at full refresh rate,
and it should "just work."

The writeback code was written to permit this, but it's not
"discoverable" via the API whether it's OK on a particular device; and
the reason for the defensive wording is that some people said it
*wasn't* OK on theirs - I seem to remember VC4 writeback makes very
few timing guarantees.

Maybe it should be discoverable. I don't know how hard it would be to
describe everyone's constraints.

> Current DRM uAPI for writeback supports single-frame jobs. Userspace
> submits an FB, it waits for the fence, then submits next FB, etc. This
> is not well-suitable (nor it is defined in a way suitable) for the
> streamed operations, where you want to generate a stream of N frames per
> second. Even though userspace sets a mode which includes a particular
> frame rate, the WB API has no guarantee that the FPS will be or can be
> met.
> 
> In my opinion a proper uAPI for the streamed Writeback should be closer
> to v4l2, where you submit several buffers in advance together with the
> corresponding fences and then they are filled one by one and the
> corresponding fence is signalled to userspace. 
> 

At the time of introducing writeback connectors, we also discussed
having a more "streaming" API, or even having an actual V4L2 device
representing the frame stream. At the time we were most interested in
single frames for static-screen and testing purposes, and there wasn't
a consensus on how to make a streaming API, so we didn't do it.

> Note, in my humble opinion, it should be perfectly possible to setup
> writeback as a clone to the existing connector (if a clone mode is
> suppored by the hardware) and then to supply the jobs (FB+fence)
> occasionally rather than for each frame. Such userspace operation should
> not cause any degradation on the main connector.

This is fine and supported by the API. If the HW supports it, you can
have a CRTC scanning out to a real display, and then every Nth frame
add a writeback framebuffer to capture that frame.

IIRC you can attach a Writeback connector without a framebuffer to a
CRTC, to support exactly this use case.

So on Frame 0:

* DP-1
  * CRTC_ID = crtc-0
* plane-0
  * CRTC_ID = crtc-0
  * FB_ID = fb-0

* Writeback-1
  * CRTC_ID = crtc-0
  * WRITEBACK_FB_ID = 0 (no framebuffer)
  * WRITEBACK_OUT_FENCE_PTR = NULL

Then on Frame 'N' where you want a writeback, and a non-modeset
commit, simply add a framebuffer:

* DP-1
  * CRTC_ID = crtc-0
* plane-0
  * CRTC_ID = crtc-0
  * FB_ID = fb-1

* Writeback-1
  * CRTC_ID = crtc-0
  * WRITEBACK_FB_ID = fb-9
  * WRITEBACK_OUT_FENCE_PTR = <whatever>

Frame 'N+1' with no writeback, don't add a framebuffer:

* DP-1
  * CRTC_ID = crtc-0
* plane-0
  * CRTC_ID = crtc-0
  * FB_ID = fb-2

* Writeback-1
  * CRTC_ID = crtc-0
  * WRITEBACK_FB_ID = 0
  * WRITEBACK_OUT_FENCE_PTR = NULL

> 
> > 
> > I want to be a good citizen and do stuff by the book here. :-P
> > 

Yeah... so as you pointed out initially, if you've got a writeback
connector attached to your scanout CRTC, "by the book" means basically
reducing update rate. Obviously if you use an entirely separate CRTC
for writeback, they operate independently.

Thanks,
-Brian

> > >>
> > >> >
> > >> >If we already have devices where you can use writeback together with real
> > >> >outputs, then I guess that counts as an oopsie :-/
> > >>
> > >> Well "works fine" fits into the "undefined behaviour" bucket, just as
> > >> well as "corrupts your fb" does :-)
> > >>
> > >> -Brian
> > 
> 
> -- 
> With best wishes
> Dmitry