Re: [RFC] Reduce idle vblank wakeups

Mario Kleiner <mario.kleiner@xxxxxxxxxxxxxxxx> · Thu, 17 Nov 2011 01:26:37 +0100

On Nov 16, 2011, at 7:48 PM, Matthew Garrett wrote:

On Wed, Nov 16, 2011 at 07:27:51PM +0100, Mario Kleiner wrote:

It's not broken hardware, but fast ping-ponging it on and off can
make the vblank counter and vblank timestamps unreliable for apps
that need high timing precision, especially for the ones that use
the OML_sync_control extensions for precise swap scheduling. My
target application is vision science
 neuroscience, where (sub-)milliseconds often matter for visual
stimulation.

I'll admit that I'm struggling to understand the issue here. If the
vblank counter is incremented at the time of vblank (which isn't the
case for radeon, it seems, but as far as I can tell is the case for
Intel) then how does ping-ponging the IRQ matter?
vblank_disable_and_save() appears to handle this case.

The drm software vblank counter which is used for scheduling swaps  
etc. gets incremented in the gpu's vblank irq handler. The gpu's  
hardware counter gets incremented somewhere inside the vblank  
interval. The increments don't happen at the same point in time. The  
race is between the vblank on/off code, which gets scheduled either  
by the timeout timer for the "off" case, or by usercode for the "on"  
case and the gpu's hardware vblank counter.

The existing code uses a lock (vblank_time_lock) to prevent some  
races between itself and the vblank irq handler, but it can't "lock"  
the gpu, so if the enable/disable code executes while the gpu is  
scanning out the vblank interval, it is undefined if the final vblank  
count will be correct or off by one. Vblank duration is somewhere up  
to 4.5% of refresh interval duration, so there's your up to 4% chance  
of getting it wrong.

If one could reliably avoid enabling/disabling in the problematic  
time period, the problem would go away.

I think making the vblank off delay driver specific via these
patches is a good idea. Lowering the timeout to something like a few
refresh cycles, maybe somewhere between 50 msecs and 100 msecs would
be also fine by me. I still would like to keep some drm config
option to disable or override the vblank off delay by users.

Does the timeout serve any purpose other than letting software
effectively prevent vblanks from being disabled?

With perfect drivers and gpu's in a perfect world, no. In reality  
there's the race i described above, and nouveau and all other drivers  
except intel and radeon. The vblank irq also drives timestamping of  
vblanks, one update per vblank. The timestamps are cached if a client  
needs them inbetween updates. Turning off vblank irq invalidates the  
timestamps. radeon and intel can recreate the timestamp anytime as  
needed, but nouveau lacks this atm., so timestamps remain invalid for  
a whole video refresh cycle after vblank irq on. We have patches for  
nouveau kms almost ready, so only the race mentioned above would remain.

The intel and radeon kms drivers implement everything that's needed
to make it mostly work. Except for a small race between the cpu and
gpu in the vblank_disable_and_save() function <http://lxr.free-
electrons.com/source/drivers/gpu/drm/drm_irq.c#L101> and
drm_update_vblank_count(). It can cause an off-by-one error when
reinitializing the drm vblank counter from the gpu's hardware
counter if the enable/disable function is called at the wrong moment
while the gpu's scanout is inside the vblank interval, see comments
in the code. I have some sketchy idea for a patch that could detect
when the race happens and retry hw counter queries to fix this.
Without that patch, there's some chance between 0% and 4% of being
off-by-one.

For Radeon, I'd have thought you could handle this by scheduling an  
irq
for the beginning of scanout (avivo has a register for that) and
delaying the vblank disable until you hit it?

For Radeon there is such an irq, but iirc we had some discussions on  
this, also with Alex Deucher, a while ago and some irq's weren't  
considered very reliable, or already used for other stuff. The idea i  
had goes like this:

Use the crtc scanout position queries together with the vblank  
counter queries inside some calibration loop, maybe executed after  
each modeset, to find out the scanline range in which the hardware  
vblank counter increments -- basically a forbidden range of scanline  
positions where the race would happen. Then at each vblank off/on,  
query scanout position before and after the hw vblank counter query.  
If according to the scanout positions the vblank counter query  
happened within the forbidden time window, retry the query. With a  
well working calibration that should add no delay in most cases and a  
delay to the on/off code of a few dozen microseconds (=duration of a  
few scanlines) worst case.

With that and the pending nouveau patches in place i think we  
wouldn't need the vblank off delay anymore on such drivers.

On current nouveau kms, disabling vblank irqs guarantees you wrong
vblank counts and wrong vblank timestamps after turning them on
again, because the kms driver doesn't implement the hook for
hardware vblank counter query correctly. The drm vblank counter
misses all counts during the vblank irq off period. Other timing
related hooks are missing as well. I have a couple of patches queued
up and some more to come for the ddx and kms driver to mostly fix
this. NVidia gpu's only have hardware vblank counters for NV-50 and
later, fixing this for earlier gpu's would require some emulation of
a hw vblank counter inside the kms driver.

I've no problem with all of this work being case by case.

Me neither. I just say if you'd go to the extreme and disable vblank  
irq's immediately with zero delay, without reliably fixing the  
problem i mentioned, you'd get those off by one counts, which would  
be very bad for apps that rely on precise timing. Doing so would  
basically make the whole oml_sync_control implementation mostly useless.

Apps that rely on the vblank counter being totally reliable over
long periods of time currently would be in a bad situation with a
lowered vblank off delay, but that's probably generally not a good
assumption. Toolkits like mine, which are more paranoid, currently
can work fine as long as the off delay is at least a few video
refresh cycles. I do the following for scheduling a reliably timed
swap:

1. Query current vblank counter current_msc and vblank timestamp
current_ust.
2. Calculate a target vblank count target_msc, based on current_msc,
current_ust and some target time from usercode.
3. Schedule bufferswap for target_msc.

As long as the vblank off delay is long enough so that vblanks don't
get turned off between 1. and 3, everything is fine, otherwise bad
things will happen.
Keeping a way to override the default off delay would be good to
allow poor scientists to work around potentially broken drivers or
gpu's in the future. @Matthew: I'm appealing here to your ex-
Drosophila biologist heritage ;-)

If vblanks are disabled and then re-enabled between 1 and 3, what's  
the
negative outcome?

1. glXGetSyncValuesOML() gives you the current vblank count and a  
timestamp of when exactly scanout of the corresponding video frame  
started. These values are the baseline for any vblank based swap  
scheduling or any other vblank synchronized actions, e.g., via  
glXWaitForMscOML().

2. App calculates stuff based on 1. but gets preempted or experiences  
scheduling delay on return from 1. - or the x-server gets preempted  
or scheduled away. Whatever, time passes...

3. glXSwapBuffersMscOML() or glXWaitForMscOML() is used to schedule  
something to occur at a target vblank count, based on the results of 1.

As long as vblanks don't get disabled between 1 and 3, everything is  
consistent. Scheduling delays/preemption of more than a few (dozen)  
milliseconds worst case are very rare, so a lowered vblank off delay  
of even one or two refresh cycles would solve the problem. I assume  
that most toolkits with needs for timing precision would follow a  
three step strategy like this one.

If vblank irq's get disabled immediately after step 1. and reenabled  
at step 3., and this triggers an off by one error due to a race, then  
the calculated target vblank count from steps 1/2 is invalid and  
useless for step 3 and many vision scientists which just started to  
make friendship with Linux lately will make very sad faces. Due to  
the nature of many of the stimulation paradigms used, there is also a  
high chance that many of the enables and disables would happen inside  
or very close to the problematic vblank interval when off by one  
error is most likely.

I agree with your patches, i just don't want vblank irq's to be  
disabled immediately with a zero timeout before remaining races are  
fixed, that would sure break things at least for scientific  
applications. And i'd like to keep some optional parameter that  
allows desparate users to disable the vblank off mechanism in case  
they would encounter a bug.

-mario

*********************************************************************
Mario Kleiner
Max Planck Institute for Biological Cybernetics
Spemannstr. 38
72076 Tuebingen
Germany

e-mail: mario.kleiner@xxxxxxxxxxxxxxxx
office: +49 (0)7071/601-1623
fax:    +49 (0)7071/601-616
www:    http://www.kyb.tuebingen.mpg.de/~kleinerm
*********************************************************************
"For a successful technology, reality must take precedence
over public relations, for Nature cannot be fooled."
(Richard Feynman)

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel