[RFC] Async flips

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 02, 2012 at 05:45:29AM +0100, Mario Kleiner wrote:
> 
> 
> On 31.10.12 19:51, Ville Syrj?l? wrote:
> > On Wed, Oct 31, 2012 at 10:44:47AM -0700, Eric Anholt wrote:
> >> Ville Syrj?l? <ville.syrjala at linux.intel.com> writes:
> >>
> >>> On Tue, Oct 30, 2012 at 01:33:47PM -0500, Jesse Barnes wrote:
> >>>> The hw supports async flips through the render ring, so why not expose it?
> >>>> It gives us one more "tear me harder" option we can use in the DDX and
> >>>> for other cases where simply flipping to the latest buffer is more
> >>>> important than visual quality.
> >>>
> >>> The only reason I can see why anyone would really want async flips is
> >>> when you're restricted to double buffering. With triple buffering you
> >>> should be able to override the previous flip w/o tearing.
> >>>
> >>> Well, actually if you use the ring based flips, then you can't do the
> >>> override. My atomic page flip code can do it because it's using mmio
> >>> flips. There were also other reasons favoring mmio over ring.
> >>>
> >>> Once the atomic code is deemed ready, I would suggest we just nuke the
> >>> ring based flip code (pun intended).
> >>
> >> Can you outline what exactly your plan is for doing faster-than-vblank
> >> page flipping without tearing, and how it gets synchronized with
> >> rendering?
> >
> > The faster than vrefresh flipping simply involves overwriting the
> > display plane registers before they've been latched by the hardware.
> > This appears to work fine already.
> >
> > As far as the synchronization goes, I basically just want a callback
> > from the GPU when it's done with the buffer. I'm expecting to find
> > some kind of GPU progress interrupt that I can enable while I'm waiting
> > for the GPU to catch up. So I also need a FIFO to store the flip
> > requests in the meantime. Once the GPU tells me it's ready, I pull the
> > flip request from the queue and proceed with the display plane
> > programming.
> >
> > So the synchronization part it's still quite handwavy, and I need
> > to study the hardware/driver in more detail to figure out the
> > specifics.
> >
> 
> That's cool. But please make sure that the behaviour will be somehow 
> controllable by OpenGL applications, via some OpenGL extension. I can 
> see use for different modes:
> 
> a) Normal double-buffering: For deterministic, well controlled timing - 
> That's what my type of applications need. Maximum control over what to 
> show next, based on precise and reliable flip completion timestamps.
> 
> b) Triple buffering with FIFO queueing of frames ahead, what the intel 
> ddx currently does, unfortunately for me with totally broken 
> timestamping, so all my users have to disable it in the xorg.conf - 
> quite a challenge for many Apple converts, which have trouble with the 
> concept of editing configuration files. It's useful if an app manages to 
> render at full refresh rate on average to smooth out occassional stalls, 
> because the gpu has one frame of completed rendering queued up in 
> advance. Maybe this also allows for some power saving if an app can 
> render and queue frames ahead of time as fast as possible (race to 
> completion) and then the cpu/gpu can go to some deeper sleep state earlier?
> 
> c) Your LIFO triple-buffering, as far as i understand, with dropping 
> late frames, to reduce latency /lag for things like video games.
> 

Right. I've been occasionally thinking about pushing the swap interval
handling to the kernel.

Currently user space needs to do the wait for vblank trick before
scheduling the swap, and then hoping that the GPU will catch up fast
enough so that the swap will happen on the next vblank. If the kernel
handled it, it could actually guarantee the OML_sync_control remainder
behaviour (well assuming kernel threads get scheduled in a timely
fashon), whereas the user space solution can't give such guarantees.

But even w/o that extra kernel feature, my code should be no worse in
that regard than the current code. You can still do the wait for vblank
trick in user space to get similar swap interval behaviour, and you can
still use as many buffers as you want. The only real difference to the
current situation is that if you schedule the flip too soon, you won't
get the EBUSY from the kernel, but instead you drop the previous flip.
But assuming the user space code is well behaved it won't try to flip
too soon, so essentially nothing will change.

> d) Flipping without vsync = tearing. I think this is at least useful for 
> benchmarks, although not for anything else.

This one I don't support curently. It would be possible to support it
(assuming the HW allows it). The simplest way would be to just add a
new flag to the ioctl to control this behaviour.

-- 
Ville Syrj?l?
Intel OTC


[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux