Re: [RFC PATCH] drm: Fix off-by-one races on vblank disable

Mario Kleiner <mario.kleiner@xxxxxxxxxxxxxxxx> · Thu, 17 Nov 2011 23:54:31 +0100

Jesse, cc'ing you in case you have thoughts about this for the intel  
side of things?

On Nov 17, 2011, at 4:39 PM, Andrew Lutomirski wrote:

2011/11/17 Michel Dänzer <michel@xxxxxxxxxxx>:
[ Dropping intel-gfx list from CC, as it automatically rejects posts
from non-subscribers ]

On Mit, 2011-11-16 at 21:39 -0800, Andy Lutomirski wrote:
There are two possible races when disabling vblanks.  If the IRQ
fired but the hardware didn't update its counter yet, then we store
too low a hardware counter.  (Sensible hardware never does this.
Apparently not all hardware is sensible.)

The thing is, 'the' IRQ and 'the' hardware counter aren't necessarily
about the same thing. We want an IRQ which triggers at the  
beginning of
vertical blank, but the Radeon hardware counters increment at the
beginning of scanout, i.e. at the end of vertical blank. Does that  
make
the hardware 'broken' or 'not sensible'?

Perhaps not.  I think that using that counter as the return value of
get_vblank_counter is a bit unfortunate in that case, though.  Anyway,
I'm not particularly attached to the comment as written.

Am I correct in guessing that Radeon fires its vblank irq at the
beginning of vblank instead of at the end  And doesn't this mean that
Radeon has a separate issue when vblank_get happens during the vblank
interval?  A trick similar to drm_calc_vbltimestamp_from_scanoutpos
ought to be able to fix that.

The Radeon's i tested (R500, R600) fire the vblank irq even a few  
scanlines before the start of vblank. I think most gpu's fire very  
close to the beginning of vblank, because one wants to use the vblank  
irq to trigger some work that needs to be done within the vblank  
interval. Some Radeon's sometimes fire a spurious vblank irq at the  
moment vblank irq's get enabled, althought that could also be some  
tiny bug in the kms driver, maybe missing some acknowledge of a  
pending irq somewhere in the vblank off or on function. That's the  
reason for some of the fun inside  
drm_calc_vbltimestamp_from_scanoutpos() and drm_handle_vblank(), to  
associate a vblank irq and the vblank timestamps with the correct  
vblank interval, even if the irq handler executes a bit outside the  
vblank interval.

If, on the other hand, the counter updated but the IRQ didn't fire
yet, we store too high a counter.

We handled the former case with a heuristic based on timestamps and
we did not handle the latter case.  By saving a little more state,
we can handle both cases exactly: all we need to do is watch for
changes in the difference between the hardware and software vblank
counts.

I'm afraid that can't work:

Some (AFAIR also Intel) hardware resets the counter to 0 when the  
CRTC
is disabled / enabled (e.g. for DPMS, or a modeset). That's why we  
ended
up only counting interrupts while the IRQ is enabled, and only  
using the
hardware counter to fill in while the IRQ is disabled. The hardware
counter cannot be assumed to have any defined behaviour between  
enabling
and disabling the IRQ.

To compensate for this, the drivers call drm_vblank_pre_modeset  
(which
enables the IRQ, which also updates the virtual counter from the
hardware counter) before disabling the CRTC and  
drm_vblank_post_modeset
(which disables the IRQ, which also records the hardware counter)  
after
enabling the CRTC.

Shouldn't this be fixable, though, by adding appropriate logic in
drm_vblank_pre_modeset and drm_vblank_post_modeset?

Would this be a lot simpler if drm had a function (or call to the
driver) to compute both the vblank count and the last vblank
timestamp?  That way the irq handler, drm_update_vblank_count,
vblank_disable_and_save, and pre/post modeset could keep everything
exactly in sync.  IIRC i915 has this capability in hardware, and if
Radeon indeed updates the frame count exactly at the end of the vblank
interval, it could compute this exactly as well.

I think Andrews idea is very close to what i proposed in Matthews RFC  
e-mail thread, if we use the scanout position as a "clock", except we  
could get rid of the calibration loop, the part i like least about my  
proposal, if we know for sure when each gpu increments its hardware  
counter.

At least intel, radeon and nouveau (hopefully soonish - patches  
almost ready for submission) expose the dev->driver- 
>get_scanout_position() hook, so we know the scanline at time of  
vblank counter query. If we knew at which scanline each gpu  
increments its counter we could simply compare against scanline at  
counter query time and decide if we need to add 1 to the value or  
not. We could make the hardware counter value look like as if it is  
from a hardware counter that always increments at leading edge of  
vblank, which i think would be consistent with what the irq driven  
vblank increment does.

Pseudocode (with bonus goto included!):

// To be called from vblank irq on/off routines instead of driver- 
>get_vblank_count(crtc):
uint32 get_effective_hw_vblank_count(crtc)
{

retry:

    uint32 scanline_pre = driver->get_scanout_position(crtc);
    uint32 hw_count = driver->get_vblank_count(crtc);
    uint32 scanline_post = driver->get_scanout_position(crtc);

    if ((scanline_pre outside vblank) && (scanline_post outside  
vblank) {
        // Inside active scanout -> All counters stable -> no problem:
        return hw_count;
    }

    // Inside vblank -> careful!
    if (scanline_where_gpu_increments_its_counter intersects  
interval [scanline_pre ; scanline_post]) {
	// Exact time of get_vblank_count() query "uncertain" wrt. hw  
counter increment.
	cpu_relax(); // or something similar...
        goto retry;
    }

    // Inside vblank. Query happened before or after hw increment?
    if (scanline_post < scanline_where_gpu_increments_its_counter) {
        // Query happened before hw counter increment for sure. Fake
        // the increment has already happened:
        hw_count++;
    }

    return hw_count;
}

Something like this? The worst case busy waiting time would be maybe  
2 scanline durations ( < 50 microseconds) in the very rare case where  
we hit exactly the scanline where hw counter increments. That amount  
of busy waiting sounds acceptable to me, given that we'd still keep a  
vblank off delay of a few milliseconds to avoid many redundant on/off  
transitions during a typical OpenGL bufferswap and that this never  
gets directly called from interrupt context.

We could put this into the drm, then the drivers would need to tell  
the drm the scanline where increment happens, or we could replace the  
driver->get_vblank_count() hook in each driver with some variant of  
this? Drivers with such a thing could lower the vblank off delay to a  
few msecs, other drivers would leave it at a high value.

Thoughts?
-mario

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel