https://bugs.freedesktop.org/show_bug.cgi?id=38800 --- Comment #26 from Mario Kleiner <mario.kleiner@xxxxxxxxxxxxxxxx> 2011-07-06 18:02:26 PDT --- Ok, i looked at the patch. The way it is now it won't work reliably and is vulnerable to the races we tried to remove in the current implementation. First, as far as i understood, the pageflip ioctl() is supposed to be asynchronous, non-blocking and fast: <http://www.mail-archive.com/dri-devel@xxxxxxxxxxxxxxxxxxxxx/msg39857.html> With the patch, the ioctl() doesn't block until pageflip completion, but it does add approx. 0.7 msecs for waiting until outside of vblank whenever a flip is scheduled more than 1 frame into the future, because then the ioctl() is usually called within vblank. It blocks in radeon_bo_wait() until all pending rendering has completed. If you have a large amount of rendering to do, or multiple clients rendering to multiple displays, i'd assume the blocking time could be substantial. It would block the x-server's main loop for that time, not just the actual client app which submits the rendering work. I don't know how much real world sluggishness this adds to the GUI, compared to other current sources of lag, but it doesn't sound like such a good idea to me to block the x-server there to avoid a little bit of complexity if we don't need to block and will have more displays in the future, esp. on 6-display radeon gpu's. If you think about having swap group support for synchronized swaps across different drawables or similar features in the future, then it would also be better to stay away from blocking in the ioctl() itself. If we move the pageflip programming from the vblank irq handler to the fence irq handler we could avoid blocking in the ioctl, with basically the same complexity as now. The bigger problem for me is that the current patch doesn't wait reliably until the flip has completed before it sends the pageflip completion event. Getting accurate vblank timestamps is not the problem. That is already solved by the drm_vblank_count_and_time() function. It returns a timestamp which corresponds to start of scanout at the end of the most recent/current vblank, just as the OML_sync_control spec requires. I tested r500, r600 and a HD-5000 eyefinity 4-display setup with special measurement equipment and the timestamps are almost microsecond accurate, compensating for any irq delays. The problem is making sure that we emit the pageflip completion event during the video frame in which the pageflip completed, not one or multiple refresh cycles later or earlier. Otherwise the timestamps are meaningless, even dangerous for applications like mine that really must rely on them, and the whole synchronization of rendering to bufferswaps and throttling is broken and we'll have beautiful graphics glitches. The proposed patch schedules radeon_crtc->unpin_work for the vblank irq handler, then blocks on radeon_bo_wait() for an unknown amount of time, then programs the flip. During this time a vblank irq could already do a too early unpin and emission of a wrong pageflip completion event and timestamp. The "wait until outside vblank" is not reliable. E.g., assume the ioctl() is called a few microseconds before the vblank, detects it is not in vblank, therefore goes ahead, gets delayed for some reason (interrupt, preemption by higher prio thread, whatever) while the gpu enters the vblank, and then programs the flip just inside the vblank -> boom. As i had to learn, vbank irq's on radeon's can also happen a few scanlines before the start of vblank, so it can happen that the check for pageflip completion would run before the flip happens, detect that no flip happened, and then the flip would happen a few microseconds later and go unnoticed for a refresh cycle and we'd be back to 30 fps on a 60 hz display. I'm not saying the proposed patch couldn't be fixed to handle this correctly, but it would require a bit of extra locking and communication between the ioctl() in process context and the vblank irq handler, and some protection of the ioctl() against preemption, to avoid stalling the irq handler on some shared locks etc. I'm not sure if the correctly behaving end result wouldn't be as "complex" as the current implementation. The current method is simple in the sense that all timing sensitive stuff happens sequentially in a single "irq thread" - the gpu irq handler can't race against itself. There can be races between the gpu and the cpu, which i think we took care of, but it avoids possible additional races between scheduling code in process context and irq context. It also tries to make sure that it emits the pageflip completion as soon as possible and doesn't delay it by a frame just because of an unlucky order of execution. My proposal and Simons patch are very similar in spirit, just that i'd program the flip from within the fence interrupt handler to avoid potential races between ioctl and irq and to avoid blocking in the ioctl(). -- Configure bugmail: https://bugs.freedesktop.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug. _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/dri-devel