Pekka Paalanen <ppaalanen@xxxxxxxxx> writes: > On Mon, 11 Aug 2014 19:27:45 +0200 > Daniel Vetter <daniel@xxxxxxxx> wrote: > >> On Mon, Aug 11, 2014 at 10:16:24AM -0700, Eric Anholt wrote: >> > Daniel Vetter <daniel@xxxxxxxx> writes: >> > >> > > On Mon, Aug 11, 2014 at 01:38:55PM +0300, Pekka Paalanen wrote: >> > >> Hi, >> > >> >> > >> there is some hardware than can do 2D compositing with an arbitrary >> > >> number of planes. I'm not sure what the absolute maximum number of >> > >> planes is, but for the discussion, let's say it is 100. >> > >> >> > >> There are many complicated, dynamic constraints on how many, what size, >> > >> etc. planes can be used at once. A driver would be able to check those >> > >> before kicking the 2D compositing engine. >> > >> >> > >> The 2D compositing engine in the best case (only few planes used) is >> > >> able to composite on the fly in scanout, just like the usual overlay >> > >> hardware blocks in CRTCs. When the composition complexity goes up, the >> > >> driver can fall back to compositing into a buffer rather than on the >> > >> fly in scanout. This fallback needs to be completely transparent to the >> > >> user space, implying only additional latency if anything. >> > >> >> > >> These 2D compositing features should be exposed to user space through a >> > >> standard kernel ABI, hopefully an existing ABI in the very near future >> > >> like the KMS atomic. >> > > >> > > I presume we're talking about the video core from raspi? Or at least >> > > something similar? >> > >> > Pekka wasn't sure if things were confidential here, but I can say it: >> > Yeah, it's the RPi. >> > >> > While I haven't written code using the compositor interface (I just did >> > enough to shim in a single plane for bringup, and I'm hoping Pekka and >> > company can handle the rest for me :) ), my understanding is that the >> > way you make use of it is that you've got your previous frame loaded up >> > in the HVS (the plane compositor hardware), then when you're asked to >> > put up a new frame that's going to be too hard, you take some >> > complicated chunk of your scene and ask the HVS to use any spare >> > bandwidth it has while it's still scanning out the previous frame in >> > order to composite that piece of new scene into memory. Then, when it's >> > done with the offline composite, you ask the HVS to do the next scanout >> > frame using the original scene with the pre-composited temporary buffer. >> > >> > I'm pretty comfortable with the idea of having some large number of >> > planes preallocated, and deciding that "nobody could possibly need more >> > than 16" (or whatever). >> > >> > My initial reaction to "we should just punt when we run out of bandwidth >> > and have a special driver interface for offline composite" was "that's >> > awful, when the kernel could just get the job done immediately, and >> > easily, and it would know exactly what it needed to composite to get >> > things to fit (unlike userspace)". I'm trying to come up with what >> > benefit there would be to having a separate interface for offline >> > composite. I've got 3 things: >> > >> > - Avoids having a potentially long, interruptible wait in the modeset >> > path while the offline composite happens. But I think we have other >> > interruptible waits in that path alreaady. >> > >> > - Userspace could potentially do something else besides use the HVS to >> > get the fallback done. Video would have to use the HVS, to get the >> > same scaling filters applied as the previous frame where things *did* >> > fit, but I guess you could composite some 1:1 RGBA overlays in GL, >> > which would have more BW available to it than what you're borrowing >> > from the previous frame's HVS capacity. >> > >> > - Userspace could potentially use the offline composite interface for >> > things besides just the running-out-of-bandwidth case. Like, it was >> > doing a nicely-filtered downscale of an overlaid video, then the user >> > hit pause and walked away: you could have a timeout that noticed that >> > the complicated scene hadn't changed in a while, and you'd drop from >> > overlays to a HVS-composited single plane to reduce power. >> > >> > The third one is the one I've actually found kind of compelling, and >> > might be switching me from wanting no userspace visibility into the >> > fallback. But I don't have a good feel for how much complexity there is >> > to our descriptions of planes, and how much poorly-tested interface we'd >> > be adding to support this usecase. >> >> Compositor should already do a rough bw guesstimate and if stuff doesn't >> change any more bake the entire scene into a single framebuffer. The exact >> same issue happens on more usual hw with video overlays, too. >> >> Ofc if it turns out that scanning out your yuv planes is less bw then the >> overlay shouldn't be stopped ofc. But imo there's nothing special here for >> the rpi. >> >> > (Because, honestly, I don't expect the fallbacks to be hit much -- my >> > understanding of the bandwidth equation is that you're mostly counting >> > the number of pixels that have to be read, and clipped-out pixels >> > because somebody's overlaid on top of you don't count unless they're in >> > the same burst read. So unless people are going nuts with blending in >> > overlays, or downscaled video, it's probably not a problem, and >> > something that gets your pixels on the screen at all is sufficient) >> >> Yeah I guess we need to check reality here. If the "we've run out of bw" >> case just never happens then it's pointless to write special code for it. >> And we can always add a limit later for the case where GL is usually >> better and tell userspace that we can't do this many planes. Exact same >> thing with running out of memory bw can happen anywhere else, too. > > I had a chat with Eric last night, and our different views about the > on-line/real-time performance limits of the HVS seem to be due to alpha > blending. > > Eric has not been using alpha blending much or at all, while my > experiments with Weston and DispmanX pretty much always need alpha > blending (e.g. because DispmanX cannot say that only a sub-region of a > buffer needs blending). Eric says alpha blending kills the > performance. Note, I wasn't saying anything about performance. I was just talking about how compositing in X knows that (almost) everything is actually opaque, so I don't have the worries about alpha blending that you apparently do in Weston.
Attachment:
pgp_wnU4poZKN.pgp
Description: PGP signature
_______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel