https://bugs.freedesktop.org/show_bug.cgi?id=38800 --- Comment #21 from Mario Kleiner <mario.kleiner@xxxxxxxxxxxxxxxx> 2011-07-06 13:54:14 PDT --- @Simon: Michel and your testcode is right, the pageflip is only programmed about 2 scanlines after the end of vblank, so the crtc waits for another full refresh cycle before flipping at the start of the next vblank. Looking at the drm.debug=0xf log, there is a huge delay between entering the radeon irq handler and reaching the vblank handling code: [68774.608689] [drm:evergreen_irq_process], r600_irq_process start: rptr 8528, wptr 8544 [68774.609052] [drm:drm_calc_vbltimestamp_from_scanoutpos], crtc 0 : v 5 p(1154,2)@ 1309862042.323318 -> 1309862042.323284 [e 3 us, 0 rep] [68774.609069] [drm:evergreen_page_flip], Update pending now high. Unlocking vupdate_lock. [68774.609077] [drm:evergreen_irq_process], IH: D1 vblank -> 68774.609052 - 68774.608689 -> 363 microseconds! That's much larger than anything i've ever seen during testing of that code path even on rather ancient hardware. The whole vblank interrupt handler takes almost 400 usecs to execute. It probably explains why the flip is scheduled so late and misses the deadline with reduced blanking. I think it would be good to find out where so much time is spent. If you set the drm.timestamp_precision_usec module parameter to zero, it will skip the high precision timestamping and do a do_gettimeofday() call instead, just to see how much time is spent there. @Michel: We wait with programming the pageflip until vblank irq, because it was the most simple way to get scheduling, pageflip completion and timestamping for pageflip events reliably done. We thought through various other methods, but each turned out to have some races: Alex told me that older Radeon's (R100-R500) don't have pageflip completion irq's, so we couldn't use those to detect and timestamp when the pageflip was really done. We first thought about scheduling the pageflip via a packet in the command stream and only checking for pageflip completion in the vblank irq handler. But that has a couple of funny race conditions, especially if the pageflip is scheduled close to or inside the target vblank interval, which can make the pageflip completion and timestamping unreliable and can also lead to large latency for scheduling a flip with the dri2 vblank event method of scheduling flips (glXSwapBuffersMscOML() etc.) if multiple clients are rendering, e.g., to multiple displays. Not good. The current implementation is relatively simple and robust, and common to all Radeons - at least as long as the vblank irq doesn't consistently miss the whole vblank interval due to large irq execution times like here ;-) -- My desire is to get very reliable and sub-millisecond precise timestamps for my applications (neuro-science stuff), but problems there would also cause screen corruption and other problems for the average user. Looking at the current code, R600 and later seem to have pageflip completion irq's. They are currently acknowledged by the irq handler, but not used at the moment. For >= R600 maybe we could do the pageflip completion/timestamping in a dedicated pageflip irq handler and do the programming of the pageflip in the fence interrupt handler? That would be as early as possible - as soon as the backbuffer is swap ready. One could move the calls to the "flip programming part" of radeon_crtc_handle_flip() from the vblank irq handler to the fence irq handler, basically unmodified with minimal overhead. And then put the "flip completion part " of radeon_crtc_handle_flip() in a separate function, called from the pageflip irq handler. There's a little issue then with the ordering of vblank irq's wrt. pageflip completion irq's and we'd need to take care of this for correct timestamping, but we can solve that in the same way as we do for the intel-kms driver, which also uses pageflip completion interrupts and has to take care of this ordering as well to get correct timestamps. For a future pageflip ioctl() v2 it would make sense to allow synchronous flipping of multiple crtc's with one ioctl() invocation to allow for tear-free multi-display swaps and for implementation of things like swap group extensions. This could be easier if pageflips are scheduled from within the fence irq handler or even earlier. With the current implementation and six display heads like on evergreen/eyefinity hardware it can get difficult to implement reliable swaps across multiple displays. -mario -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel