Re: Correct sequencing of usage of DRM writeback connector

Daniel Vetter <daniel@xxxxxxxx> · Fri, 21 Jun 2024 18:23:28 +0200

On Tue, Jun 18, 2024 at 07:10:58PM -0700, Abhinav Kumar wrote:
> 
> 
> On 6/18/2024 2:33 AM, Daniel Vetter wrote:
> > On Mon, Jun 17, 2024 at 10:52:27PM +0300, Dmitry Baryshkov wrote:
> > > On Mon, Jun 17, 2024 at 11:28:35AM GMT, Abhinav Kumar wrote:
> > > > Hi
> > > > 
> > > > On 6/17/2024 9:54 AM, Brian Starkey wrote:
> > > > > Hi,
> > > > > 
> > > > > On Mon, Jun 17, 2024 at 05:16:36PM +0200, Daniel Vetter wrote:
> > > > > > On Mon, Jun 17, 2024 at 01:41:59PM +0000, Hoosier, Matt wrote:
> > > > > > > Hi,
> > > > > > > 
> > > > > > > There is a discussion ongoing over in the compositor world about the implication of this cautionary wording found in the documentation for the DRM_MODE_CONNECTOR_WRITEBACK connectors:
> > > > > > > 
> > > > > > > >    *  "WRITEBACK_OUT_FENCE_PTR":
> > > > > > > >    *	Userspace can use this property to provide a pointer for the kernel to
> > > > > > > >    *	fill with a sync_file file descriptor, which will signal once the
> > > > > > > >    *	writeback is finished. The value should be the address of a 32-bit
> > > > > > > >    *	signed integer, cast to a u64.
> > > > > > > >    *	Userspace should wait for this fence to signal before making another
> > > > > > > >    *	commit affecting any of the same CRTCs, Planes or Connectors.
> > > > > > > >    *	**Failure to do so will result in undefined behaviour.**
> > > > > > > >    *	For this reason it is strongly recommended that all userspace
> > > > > > > >    *	applications making use of writeback connectors *always* retrieve an
> > > > > > > >    *	out-fence for the commit and use it appropriately.
> > > > > > > >    *	From userspace, this property will always read as zero.
> > > > > > > 
> > > > > > > The question is whether it's realistic to hope that a DRM writeback
> > > > > > > connector can produce results on every frame, and do so without dragging
> > > > > > > down the frame-rate for the connector.
> > > > > > > 
> > > > > > > The wording in the documentation above suggests that it is very likely
> > > > > > > the fence fd won't signal userspace until after the vblank following the
> > > > > > > scanout during which the writeback was applied (call that frame N). This
> > > > > > > would mean that the compositor driving the connector would typically be
> > > > > > > unable to legally queue a page flip for frame N+1.
> > > > > > > 
> > > > > > > Is this the right interpretation? Is the writeback hardware typically
> > > > > > > even designed with a streaming use-case in mind? Maybe it's just
> > > > > > > intended for occasional static screenshots.
> > > > > > 
> > > > > > So typically writeback hardware needs its separate crtc (at least the
> > > > > > examples I know of) and doesn't make a lot of guarantees that it's fast
> > > > > > enough for real time use. Since it's a separate crtc it shouldn't hold up
> > > > > > the main composition loop, and so this should be all fine.
> > > > > 
> > > > > On Mali-DP and Komeda at least, you can use writeback on the same CRTC
> > > > > that is driving a "real" display, and it should generally work. If the
> > > > > writeback doesn't keep up then the HW will signal an error, but it was
> > > > > designed to work in-sync with real scanout, on the same pipe.
> > > > > 
> > > > 
> > > > Same with MSM hardware. You can use writeback with same CRTC that is driving
> > > > a "real" display and yes we call it concurrent writeback. So I think it is
> > > > correct in the documentation to expect to wait till this is signaled if the
> > > > same CRTC is being used.
> > 
> > TIL
> > 
> > > > > > If/when we have hardware and driver support where you can use the
> > > > > > writeback connector as a real-time streamout kind of thing, then we need
> > > > > > to change all this, because with the current implementation, there's
> > > > > > indeed the possibility that funny things can happen if you ignore the
> > > > > > notice (funny as in data corruption, not funny as the kernel crashes of
> > > > > > course).
> > > > > 
> > > > > Indeed, the wording was added (from what I remember from so long
> > > > > ago...) because it sounded like different HW made very different
> > > > > guarantees/non-guarantees about what data would be written when, so
> > > > > perhaps you'd end up with some pixels from the next frame in your
> > > > > buffer or something.
> > > > > 
> > > > > Taking Mali-DP/Komeda again, the writeback configuration is latched
> > > > > along with everything else, and writeback throughput permitting, it
> > > > > should "just work" if you submit a new writeback every frame. It
> > > > > drains out the last of the data during vblank, before starting on the
> > > > > next frame. That doesn't help the "general case" though.
> > > > > 
> > > > 
> > > > Would it be fair to summarize it like below:
> > > > 
> > > > 1) If the same CRTC is shared with the real time display, then the hardware
> > > > is expected to fire this every frame so userspace should wait till this is
> > > > signaled.
> > > 
> > > As I wrote in response to another email in this thread, IMO existing
> > > uAPI doesn't fully allow this. There is no way to enforce 'vblank'
> > > handling onto the userspace. So userspace should be able to supply at
> > > least two buffers and then after the vblank it should be able to enqueue
> > > the next buffer, while the filled buffer is automatically dequeued by
> > > the driver and is not used for further image output.
> > 
> 
> Sorry for the late response. What I meant was, if we are using concurrent
> writeback with the real time display, it should be capable of running at the
> same speed as the real time display. I do not have the numbers to share but
> atleast that's the expectation.
> 
> But, yes I do admit that current UAPI does not fully allow having a queue
> depth for WB FBs. And having it will help us.
> 
> > Yeah if you want streaming writeback we need a queue depth of at least 2
> > in the kms api. Will help a lot on all hardware, but on some it's required
> > because the time when the writeback buffer is fully flushed is after the
> > point of no return for the next frame (which is when the vblank event is
> > supposed to go out).
> > 
> > I think over the years we've slowly inched forward to make at least the
> > drm code safe for a queue depth of 2 in the atomic machinery, but the
> > writeback and driver code probably needs a bunch of work.
> > -Sima
> > 
> > > 
> > > > 
> > > > 2) If a different CRTC is used for the writeback, then the composition loop
> > > > for the real time display should not block on this unless its a mirroring
> > > > use-case, then we will be throttled by the lowest refresh rate anyway.
> > > 
> > > what is mirroring in this case? You have specified that a different CRTC
> > > is being used.
> > > 
> 
> Definition of mirroring could be thought of in two ways:
> 
> 1) in clone mode, the WB is running at the same rate as the real time
> display and hence if we are mirroring the content this way there is same
> CRTC.
> 
> 2) lets say I want to mirror my content using wifi display but the
> end-monitor is running on a different resolution and fps, then I cannot use
> clone mode in this case right because the CRTC which the writeback is using
> will be programmed for a different mode than the real time display.
> 
> For the second case, it is still mirroring the content but with a different
> CRTC so will be slowed down by the slowest display otherwise the displays
> will go out of sync. This is what I meant in this use-case.

Separate CRTC I think should work, because you can run the 2 queues in
parallel. The issue is only with the single crtc use-case because:

- the writeback finishes only a bit (depends upon how the hw flushes out
  writebacks) after the next vblank period has started.

- at least on some hardware you need to submit the next kms state _before_
  the vblank period has started. And even on hw/drivers where this is not
  the case, only having the vblank window to submit the next kms atomic
  state is really a bit too small.

As soon as you have 2 crtc you can untangle these and drive them with 2
loops, and still hit every frame with the writeback (since the separate
writeback crtc can run a tiny bit behind the display one).

Of course if your userspace only has one redraw loop (android suffered
from that for years, not sure it's fixed), then yes you'll slow down.
-Sima

> 
> > > > 
> > > > > > 
> > > > > > If we already have devices where you can use writeback together with real
> > > > > > outputs, then I guess that counts as an oopsie :-/
> > > > > 
> > > > > Well "works fine" fits into the "undefined behaviour" bucket, just as
> > > > > well as "corrupts your fb" does :-)
> > > 
> > > 
> > > -- 
> > > With best wishes
> > > Dmitry
> > 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch