Re: How to design a DRM KMS driver exposing 2D compositing?

Eric Anholt <eric@xxxxxxxxxx> · Mon, 11 Aug 2014 10:16:24 -0700

Daniel Vetter <daniel@xxxxxxxx> writes:

> On Mon, Aug 11, 2014 at 01:38:55PM +0300, Pekka Paalanen wrote:
>> Hi,
>> 
>> there is some hardware than can do 2D compositing with an arbitrary
>> number of planes. I'm not sure what the absolute maximum number of
>> planes is, but for the discussion, let's say it is 100.
>> 
>> There are many complicated, dynamic constraints on how many, what size,
>> etc. planes can be used at once. A driver would be able to check those
>> before kicking the 2D compositing engine.
>> 
>> The 2D compositing engine in the best case (only few planes used) is
>> able to composite on the fly in scanout, just like the usual overlay
>> hardware blocks in CRTCs. When the composition complexity goes up, the
>> driver can fall back to compositing into a buffer rather than on the
>> fly in scanout. This fallback needs to be completely transparent to the
>> user space, implying only additional latency if anything.
>> 
>> These 2D compositing features should be exposed to user space through a
>> standard kernel ABI, hopefully an existing ABI in the very near future
>> like the KMS atomic.
>
> I presume we're talking about the video core from raspi? Or at least
> something similar?

Pekka wasn't sure if things were confidential here, but I can say it:
Yeah, it's the RPi.

While I haven't written code using the compositor interface (I just did
enough to shim in a single plane for bringup, and I'm hoping Pekka and
company can handle the rest for me :) ), my understanding is that the
way you make use of it is that you've got your previous frame loaded up
in the HVS (the plane compositor hardware), then when you're asked to
put up a new frame that's going to be too hard, you take some
complicated chunk of your scene and ask the HVS to use any spare
bandwidth it has while it's still scanning out the previous frame in
order to composite that piece of new scene into memory.  Then, when it's
done with the offline composite, you ask the HVS to do the next scanout
frame using the original scene with the pre-composited temporary buffer.

I'm pretty comfortable with the idea of having some large number of
planes preallocated, and deciding that "nobody could possibly need more
than 16" (or whatever).

My initial reaction to "we should just punt when we run out of bandwidth
and have a special driver interface for offline composite" was "that's
awful, when the kernel could just get the job done immediately, and
easily, and it would know exactly what it needed to composite to get
things to fit (unlike userspace)".  I'm trying to come up with what
benefit there would be to having a separate interface for offline
composite.  I've got 3 things:

- Avoids having a potentially long, interruptible wait in the modeset
  path while the offline composite happens.  But I think we have other
  interruptible waits in that path alreaady.

- Userspace could potentially do something else besides use the HVS to
  get the fallback done.  Video would have to use the HVS, to get the
  same scaling filters applied as the previous frame where things *did*
  fit, but I guess you could composite some 1:1 RGBA overlays in GL,
  which would have more BW available to it than what you're borrowing
  from the previous frame's HVS capacity.

- Userspace could potentially use the offline composite interface for
  things besides just the running-out-of-bandwidth case.  Like, it was
  doing a nicely-filtered downscale of an overlaid video, then the user
  hit pause and walked away: you could have a timeout that noticed that
  the complicated scene hadn't changed in a while, and you'd drop from
  overlays to a HVS-composited single plane to reduce power.

The third one is the one I've actually found kind of compelling, and
might be switching me from wanting no userspace visibility into the
fallback.  But I don't have a good feel for how much complexity there is
to our descriptions of planes, and how much poorly-tested interface we'd
be adding to support this usecase.

(Because, honestly, I don't expect the fallbacks to be hit much -- my
understanding of the bandwidth equation is that you're mostly counting
the number of pixels that have to be read, and clipped-out pixels
because somebody's overlaid on top of you don't count unless they're in
the same burst read.  So unless people are going nuts with blending in
overlays, or downscaled video, it's probably not a problem, and
something that gets your pixels on the screen at all is sufficient)
Attachment:
pgpciBHbwZckx.pgp

Description: PGP signature
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel