Re: [PATCH 1/2] Revert "drm/radeon: remove drm_vblank_get|put from pflip handling"

Christian König <deathsimple@xxxxxxxxxxx> · Wed, 18 Jun 2014 11:14:29 +0200

Am 18.06.2014 07:53, schrieb Michel Dänzer:
On 17.06.2014 20:41, Christian König wrote:
Am 17.06.2014 12:12, schrieb Michel Dänzer:
From: Michel Dänzer <michel.daenzer@xxxxxxx>

This reverts commit 75f36d861957cb05b7889af24c8cd4a789398304.

drm_vblank_get() is necessary to ensure the DRM vblank counter value is
up to date in drm_send_vblank_event().

Seems to fix weston hangs waiting for page flips to complete.

Signed-off-by: Michel Dänzer <michel.daenzer@xxxxxxx>
Both patches are: Reviewed-by: Christian König <christian.koenig@xxxxxxx>
Thank you.

Looking into these issues has got me thinking about the use of the page
flip interrupt: If the page flip interrupt arrives before the corresponding
vertical blank interrupt, the DRM vblank counter will be lower than
expected by 1 in drm_send_vblank_event(). I suspect this is the cause of

  (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc [x-1] < target_msc [x]

messages in the X log file which have been popping up in bug reports lately.
This also results in 0s being returned to the client for the MSC and
timestamp of the swap completion, which could cause all kinds of bad
behaviour.
First of all thanks for looking into it. Are you getting this on 3.16 or 
3.15?

I don't think that the pflip irq is thrown earlier than the vblank, but 
on 3.16 it might actually be that we program the flip so fast into the 
hardware that we do it one frame earlier than planned.

The easy way to avoid that would be to stop using the page flip interrupt
for this again. Could there be another solution for the issues you
addressed by using it?
The original problem was that programming the flip in the vblank event 
actually doesn't work reliable because of the underlying hardware double 
buffering. We just can't tell if the flip will complete in this frame or 
if the vblank interrupt was processed so late that it will happen in the 
next frame.

We could just busy loop until either the pending bit or the bit for the 
update period becomes null, but even busy waiting for the pending bit to 
go up in an interrupt handler like we did before is quite questionable.

Additional to that using the pflip interrupt enables us to sync to the 
hblank as well or just not at all with just changing a few register 
bits. And it's also a prerequisite of switching to a non constant sync 
rate. So I would like to keep it and try to fix the issues we are seeing 
instead.

If not, another issue I encountered in 3.15 is that
radeon_crtc_handle_flip() is called unconditionally when a page flip
interrupt arrives. If the flip was already handled (presumably from the
vertical blank interrupt), the BUG_ON() in drm_vblank_put() triggers a
panic. This happened to me with weston.
Calling radeon_crtc_handle_flip multiple times shouldn't be a problem, 
that can happen with the old code as well. Setting unpin_work to NULL 
under a spin lock protects us from that case.

But take a look at the 3.15 version of radeon_crtc_page_flip instead!!! 
We first set "unpin_work", release the spin lock and *then* reserve and 
pin the BO. If I'm not completely wrong there is a race condition here 
that when the vblank interrupt happens before the rest of the function 
all kind of bad things can happen.

The only thing preventing us from that is that the vblank interrupt is 
turned on only at the end of the function, but the vblank interrupt can 
be turned on before by other reasons as well.

This is presumably not an issue in 3.16 because radeon_crtc_handle_flip()
now bails early if radeon_crtc->flip_work == NULL.

Thanks,
Christian.
_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel