On Tue, Jun 04, 2019 at 11:45:43PM -0300, Rodrigo Siqueira wrote: > After the commit def35e7c5926 ("drm/vkms: Bugfix extra vblank frame") > some of the crc tests started to fail in the vkms with the following > error: > > [drm:drm_crtc_add_crc_entry [drm]] *ERROR* Overflow of CRC buffer, > userspace reads too slow. > [drm] failed to queue vkms_crc_work_handle > ... > > The aforementioned commit fixed the extra vblank added by > `drm_crtc_arm_vblank_event()` which is invoked inside > `vkms_crtc_atomic_flush()` if the vblank event count was zero, otherwise > `drm_crtc_send_vblank_event()` is invoked. The fix was implemented in > `vkms_get_vblank_timestamp()` by subtracting one period from the current > timestamp, as the code snippet below illustrates: > > if (!in_vblank_irq) > *vblank_time -= output->period_ns; > > The above fix works well when `drm_crtc_arm_vblank_event()` is invoked. > However, it does not properly work when `drm_crtc_send_vblank_event()` > executes since it subtracts the correct timestamp, which it shouldn't. > In this case, the `drm_crtc_accurate_vblank_count()` function will > returns the wrong frame number, which generates the aforementioned > error. Such decrease in `get_vblank_timestamp()` produce a negative > number in the following calculation within `drm_update_vblank_count()`: > > u64 diff_ns = ktime_to_ns(ktime_sub(t_vblank, vblank->time)); > > After this operation, the DIV_ROUND_CLOSEST_ULL macro is invoked using > diff_ns with a negative number, which generates an undefined result; > therefore, the returned frame is a huge and incorrect number. Finally, > the code below is part of the `vkms_crc_work_handle()`, note that the > while loop depends on the returned value from > `drm_crtc_accurate_vblank_count()` which may cause the loop to take a > long time to finish in case of huge value. > > frame_end = drm_crtc_accurate_vblank_count(crtc); > while (frame_start <= frame_end) > drm_crtc_add_crc_entry(crtc, true, frame_start++, &crc32); > > This commit fixes this issue by checking if the vblank timestamp > corresponding to the current software vblank counter is equal to the > current vblank; if they are equal, it means that > `drm_crtc_send_vblank_event()` was invoked and vkms does not need to > discount the extra vblank, otherwise, `drm_crtc_arm_vblank_event()` was > executed and vkms have to discount the extra vblank. This fix made the > CRC tests work again whereas keep all tests from kms_flip working as > well. > > V2: Update commit message > > Signed-off-by: Rodrigo Siqueira <rodrigosiqueiramelo@xxxxxxxxx> > Signed-off-by: Shayenne Moura <shayenneluzmoura@xxxxxxxxx> Thanks a lot for typing up this commit message. I'm still not following what's going on (this stuff is tricky), but now I think I can at least ask useful scenarios. For my understanding: Things go wrong when in the function vkms_crtc_atomic_flush() the call to drm_crtc_vblank_get() returns 0, and we call drm_crtc_send_vblank_event() directly? I'm not 100% from your description above whether that's the failure case where everything blows up. The other part I'm having a hard time understanding: If we are in the case where we call drm_crtc_send_vblank_event() directly, how do we end up in the vkms_get_vblank_timestamp(). >From what I can see there's not caller to where we sample a new vblank ... I think a full backtrace that hits the new condition you're adding to understand when exactly we're hitting it would be perfect. > --- > drivers/gpu/drm/vkms/vkms_crtc.c | 4 ++++ > 1 file changed, 4 insertions(+) > > diff --git a/drivers/gpu/drm/vkms/vkms_crtc.c b/drivers/gpu/drm/vkms/vkms_crtc.c > index 7508815fac11..3ce60e66673e 100644 > --- a/drivers/gpu/drm/vkms/vkms_crtc.c > +++ b/drivers/gpu/drm/vkms/vkms_crtc.c > @@ -74,9 +74,13 @@ bool vkms_get_vblank_timestamp(struct drm_device *dev, unsigned int pipe, > { > struct vkms_device *vkmsdev = drm_device_to_vkms_device(dev); > struct vkms_output *output = &vkmsdev->output; > + struct drm_vblank_crtc *vblank = &dev->vblank[pipe]; > > *vblank_time = output->vblank_hrtimer.node.expires; > > + if (*vblank_time == vblank->time) > + return true; Ok, I think with the above I do now have a rough idea what's going wrong. I think that would be a bug in the drm_vblank.c. Or at least it could be, I think I first need to better understand still what's going on here to decided that. I have a bit a hunch this is fallout from our vblank fudging, but I guess we'll see. It's definitely clear that things blow up if the vblank time somehow goes backwards, but I don't understand yet how that's possible. -Daniel > + > if (!in_vblank_irq) > *vblank_time -= output->period_ns; > > -- > 2.21.0 > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ dri-devel mailing list dri-devel@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/dri-devel