Re: [RFC 0/9] nuclear pageflip

Ville Syrjälä <ville.syrjala@xxxxxxxxxxxxxxx> · Fri, 14 Sep 2012 18:48:34 +0300

On Fri, Sep 14, 2012 at 09:45:18AM -0500, Rob Clark wrote:
> On Fri, Sep 14, 2012 at 8:58 AM, Ville Syrjälä
> <ville.syrjala@xxxxxxxxxxxxxxx> wrote:
> > On Fri, Sep 14, 2012 at 08:25:53AM -0500, Rob Clark wrote:
> >> On Fri, Sep 14, 2012 at 7:50 AM, Ville Syrjälä
> >> <ville.syrjala@xxxxxxxxxxxxxxx> wrote:
> >> > On Thu, Sep 13, 2012 at 11:35:59AM -0500, Rob Clark wrote:
> >> >> On Thu, Sep 13, 2012 at 9:29 AM, Ville Syrjälä
> >> >> <ville.syrjala@xxxxxxxxxxxxxxx> wrote:
> >> >> > On Thu, Sep 13, 2012 at 08:39:54AM -0500, Rob Clark wrote:
> >> >> >> On Thu, Sep 13, 2012 at 3:40 AM, Ville Syrjälä
> >> >> >> <ville.syrjala@xxxxxxxxxxxxxxx> wrote:
> >> [snip]
> >> >> >> >
> >> >> >> > I would say this is going to be the most common use case if you consider
> >> >> >> > just the number of shipping devices. It's pretty much what every Android
> >> >> >> > phone/tablet with a HDMI port has to do.
> >> >> >>
> >> >> >> bleh, surfaceflinger kinda sucks then..
> >> >> >
> >> >> > Why? This use case is not enforced by surfaceflinger, it's just the use
> >> >> > case most devices would have.
> >> >> >
> >> >> > I don't think there's anything wrong with the way surfaceflinger is designed
> >> >> > with the prepare and commit phases. How else would you do it?
> >> >>
> >> >> well, maybe I misunderstood how surfaceflinger works, but it sounded
> >> >> like it has one prepare/commit phase across outputs, vs what weston
> >> >> compositor does where each output is rendered and flipped
> >> >> independently at the rate of that particular output.  If the two
> >> >> outputs just happen to be vsync aligned, you would end up flipping at
> >> >> the same time, but if the are not locked you don't have any artificial
> >> >> constraint in the rendering/flipping.
> >> >
> >> > OK so it's purely a pull based model, whereas surfaceflinger is more
> >> > push based.
> >> >
> >> > I suppose it might be possible to make surfaceflinger support a pull
> >> > model by driving the compositor loop through a combined signal from
> >> > multiple outputs. But IIRC it did have some timing related code in
> >> > there somewhere, so it might not be happy about it. It might also
> >>
> >> As I understood, at least in older versions android versions,
> >> rendering was based on a timer as there was no vblank event to
> >> userspace on most SoC platforms (which sounds strange, but so far most
> >> SoC's are using fbdev and/or crazy hacks rather than drm/kms)
> >>
> >> not sure if the timer is still there.. but I hope it goes away, it is
> >> really a horrible way to keep track of vsync
> >
> > I've only looked at ICS in any detail. At least there we used the page
> > flip event from one display to set the pace of the compositor loop.
> > IIRC JB is supposed to have some vsync related changes, but I haven't
> > looked at the code.
> >
> >> > affect the clients' rendering speed since the compositor would be
> >> > pulling their buffers from queue at non-constant speed. I don't
> >> > remember the details of the buffer management very well, so I can't be
> >> > sure though. But I probably wouldn't bother trying this, since the
> >> > straightforward approach is so simple, and the results are reasonably
> >> > good.
> >> >
> >> > The pull model does seem more flexible. But it does require a bit of
> >> > extra complexity in the compositor to avoid compositing the same scene
> >> > multiple times needlessly when multiple cloned displays are involved.
> >> > I suppose ideally you'd want to recompose for each display to minimize
> >> > visible latency, but from power usage POV it may not be a good idea.
> >>
> >> fwiw, weston is already being pretty clever about keeping track of
> >> damage and minimizing the area of the screen that must be re-rendered.
> >>  I'm not sure if SF does anything like this.
> >
> > IIRC it can do that, but the EGL implementation needs to support
> > EGL_BUFFER_PRESERVED.
> >
> > I suppose the best way to implement EGL_BUFFER_PRESERVED with
> > page flips would be to schedule the flip and immediately perform
> > a blit from the new front buffer to the new back buffer. Well,
> > unless the hardware has some more clever mechanism for it.
> >
> > Does weston depend on preserved flips too, or can it even track
> > damage independently for each buffer?
> 
> well, weston knows how many buffers are at play.  So it takes the
> union of the damage from the last time the buffer was used (well,
> currently it assumes only double buffered) and the new damage.

With more buffer it'll get a bit more complicate as it needs to keep 
accumulating the damage for all buffers. But it should still be fairly
trivial when you're in full control of the buffers.

> This
> way it avoids need for the gl driver, which doesn't know as well what
> is going on as the app, from needing to do a back-blit.  It can do
> this because w/ drm/gbm egl winsys, eglSwapBuffers() doesn't actually
> swap the buffers on the display and weston is in charge of which
> buffer is displayed or rendered.  Weston explicitly calls page flip
> ioctl.  The good news being that it can atomically flip overlay layers
> at the same time once the new ioctl is in place.

Yeah, with EGL in the mix, as can be the case with Android, the layering
can start to work against you a little bit. Well, it's not too bad based
on my experience though.

> Maybe it is useful to look at http://github.com/robclark/kmscube .. it
> doesn't actually use planes, but shows the interaction of egl and kms.
>  Maybe I should enhance it w/ multiple rotating cubes on different
> overlays. ;-)

When doing the Medfield work, I had a test case which utilized the
video overlays and the primary plane. What it did was draw ugly 
colored rectangles on the primary layer, and positioned the overlays 
to cover those up exactly. When the atomic page flip system was
operating as intended you could only see a solid color on the screen.
It randomly changed the position and size of the rectangles and
overlays. I really need to dig that up modify it to work with my
current code.

> >> >> note that the test phase doesn't need vblank events, and also
> >> >> shouldn't -EBUSY if there is still a pending flip[*],
> >> >
> >> > Right. Personally I'm not a fan of the EBUSY behaviour at all. Seems
> >> > a bit pointless since user space can take care of it via the event
> >> > mechanism. But I suppose you want it for omap so that you can avoid
> >> > having to write software workarounds to overcome the GO bit
> >> > limitations.
> >>
> >> I the main issue is disconnecting an overlay from one crtc and
> >> connecting to another.. I would expect that any hw which can connect
> >> an ovl to more than one possible crtc would have the same limit (ie.
> >> have to wait until scanout on previous crtc completes), so I think
> >> EBUSY is a good way to indicate to userspace that the requested
> >> configuration is not possible *now* but would be possible in the
> >> future.
> >
> > Intel HW can do the transition automagically, but if you try to
> > combine it with other page flips, the driver would have to perform some
> > gynmastics to make things appear atomic. Of course if you'd try to swap
> > overlay A from pipe 1 to pipe 2, and overlay B from pipe 2 to pipe 1 at
> > the same time, there's just no way to do that without sacrificing
> > atomicity on one of the pipes.
> >
> > So even with such HW, it's probably easier to forget about the feature,
> > and require user space to perform the disable+enable sequence in two steps.
> 
> true, but I don't want to block the disable until vblank w/
> atomic-pageflip, and if userspace re-enables the plane on a different
> crtc before the next vblank, it would be useful for the driver to have
> a way to say 'try again later'.

Yeah, trying to do it all asynchronously in the kernel would perhaps
be too much work for little gain.

I wonder if you've though about omap's FIFO merge. It can cause similar
issues, that is some operations may need two vblanks to complete. And it
looks like I'll get to worry about this stuff too since there are some
watermark related wait_for_vblank() workarounds in the IVB sprite code,
sigh.

> And if we do support multiple crtc's w/ pageflip, I'm not sure if
> there is a good way to enforce two-steps.  Having a standardized way
> to tell userspace to try later seems like a good thing.

Sure, for that it seems reasonable.

> >> >> >> Also, if you pageflip on multiple CRTC's, should the be multiple
> >> >> >> vblank events, and multiple userdata's?
> >> >> >
> >> >> > That's a bit of an open question. I was considering several options:
> >> >>
> >> >> the thing I like about one ioctl per crtc is that it avoids this whole
> >> >> question..
> >> >>
> >> >> And, I think as long as you have to update multiple different scanout
> >> >> address registers, there is always going to be a race in multi-crtc
> >> >> flipping.  Having a single ioctl does make the race smaller.  I'm not
> >> >> sure how important that point is.
> >> >
> >> > Which race?
> >>
> >> ie. if you set REG_CRTC1_ADDR just immediately before vblank and
> >> REG_CRTC2_ADDR just after
> >
> > Well, with unsynced crtcs I wouldn't call that any kind of meaningful race.
> > The same problem after all exists even with a single crtc. You either make
> > the deadline and write the register before vblank, or you don't make it
> > and end up with a repeated frame.
> 
> I meant w/ sync'd crtc's, there is still no 100% guarantee that the
> two flip at the same time.

Sure there is. That's what the vblank evade stuff gives you. I just
happen to need it even when using just one crtc because the hardware
doesn't have the necessary mechanism to flip several planes atomically.

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel