On Tue, Feb 09, 2016 at 12:07:27PM +0200, Ville Syrjälä wrote: > On Tue, Feb 09, 2016 at 10:56:38AM +0100, Daniel Vetter wrote: > > On Mon, Feb 08, 2016 at 02:13:25AM +0100, Mario Kleiner wrote: > > > This fixes a regression introduced by the new drm_update_vblank_count() > > > implementation in Linux 4.4: > > > > > > Restrict the bump of the software vblank counter in drm_update_vblank_count() > > > to a safe maximum value of +1 whenever there is the possibility that > > > concurrent readers of vblank timestamps could be active at the moment, > > > as the current implementation of the timestamp caching and updating is > > > not safe against concurrent readers for calls to store_vblank() with a > > > bump of anything but +1. A bump != 1 would very likely return corrupted > > > timestamps to userspace, because the same slot in the cache could > > > be concurrently written by store_vblank() and read by one of those > > > readers in a non-atomic fashion and without the read-retry logic > > > detecting this collision. > > > > > > Concurrent readers can exist while drm_update_vblank_count() is called > > > from the drm_vblank_off() or drm_vblank_on() functions or other non-vblank- > > > irq callers. However, all those calls are happening with the vbl_lock > > > locked thereby preventing a drm_vblank_get(), so the vblank refcount > > > can't increase while drm_update_vblank_count() is executing. Therefore > > > a zero vblank refcount during execution of that function signals that > > > is safe for arbitrary counter bumps if called from outside vblank irq, > > > whereas a non-zero count is not safe. > > > > > > Whenever the function is called from vblank irq, we have to assume concurrent > > > readers could show up any time during its execution, even if the refcount > > > is currently zero, as vblank irqs are usually only enabled due to the > > > presence of readers, and because when it is called from vblank irq it > > > can't hold the vbl_lock to protect it from sudden bumps in vblank refcount. > > > Therefore also restrict bumps to +1 when the function is called from vblank > > > irq. > > > > > > Such bumps of more than +1 can happen at other times than reenabling > > > vblank irqs, e.g., when regular vblank interrupts get delayed by more > > > than 1 frame due to long held locks, long irq off periods, realtime > > > preemption on RT kernels, or system management interrupts. > > > > > > Signed-off-by: Mario Kleiner <mario.kleiner.de@xxxxxxxxx> > > > Cc: <stable@xxxxxxxxxxxxxxx> # 4.4+ > > > Cc: michel@xxxxxxxxxxx > > > Cc: vbabka@xxxxxxx > > > Cc: ville.syrjala@xxxxxxxxxxxxxxx > > > Cc: daniel.vetter@xxxxxxxx > > > Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx > > > Cc: alexander.deucher@xxxxxxx > > > Cc: christian.koenig@xxxxxxx > > > > Imo this is duct-tape. If we want to fix this up properly I think we > > should just use a full-blown seqlock instead of our hand-rolled one. And > > that could handle any increment at all. > > And I even fixed this [1] almost a half a year ago when I sent the > original series, but that part got held hostage to the same seqlock > argument. Perfect is the enemy of good. > > [1] https://lists.freedesktop.org/archives/intel-gfx/2015-September/075879.html Hm yeah, that does suffer from reinventing seqlocks. But I'd prefer your patch over Mario's hack here tbh. Your patch with seqlock would be even more awesome. -Daniel > > > -Daniel > > > > > --- > > > drivers/gpu/drm/drm_irq.c | 41 +++++++++++++++++++++++++++++++++++++++++ > > > 1 file changed, 41 insertions(+) > > > > > > diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c > > > index bcb8528..aa2c74b 100644 > > > --- a/drivers/gpu/drm/drm_irq.c > > > +++ b/drivers/gpu/drm/drm_irq.c > > > @@ -221,6 +221,47 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe, > > > diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0; > > > } > > > > > > + /* > > > + * Restrict the bump of the software vblank counter to a safe maximum > > > + * value of +1 whenever there is the possibility that concurrent readers > > > + * of vblank timestamps could be active at the moment, as the current > > > + * implementation of the timestamp caching and updating is not safe > > > + * against concurrent readers for calls to store_vblank() with a bump > > > + * of anything but +1. A bump != 1 would very likely return corrupted > > > + * timestamps to userspace, because the same slot in the cache could > > > + * be concurrently written by store_vblank() and read by one of those > > > + * readers without the read-retry logic detecting the collision. > > > + * > > > + * Concurrent readers can exist when we are called from the > > > + * drm_vblank_off() or drm_vblank_on() functions and other non-vblank- > > > + * irq callers. However, all those calls to us are happening with the > > > + * vbl_lock locked to prevent drm_vblank_get(), so the vblank refcount > > > + * can't increase while we are executing. Therefore a zero refcount at > > > + * this point is safe for arbitrary counter bumps if we are called > > > + * outside vblank irq, a non-zero count is not 100% safe. Unfortunately > > > + * we must also accept a refcount of 1, as whenever we are called from > > > + * drm_vblank_get() -> drm_vblank_enable() the refcount will be 1 and > > > + * we must let that one pass through in order to not lose vblank counts > > > + * during vblank irq off - which would completely defeat the whole > > > + * point of this routine. > > > + * > > > + * Whenever we are called from vblank irq, we have to assume concurrent > > > + * readers exist or can show up any time during our execution, even if > > > + * the refcount is currently zero, as vblank irqs are usually only > > > + * enabled due to the presence of readers, and because when we are called > > > + * from vblank irq we can't hold the vbl_lock to protect us from sudden > > > + * bumps in vblank refcount. Therefore also restrict bumps to +1 when > > > + * called from vblank irq. > > > + */ > > > + if ((diff > 1) && (atomic_read(&vblank->refcount) > 1 || > > > + (flags & DRM_CALLED_FROM_VBLIRQ))) { > > > + DRM_DEBUG_VBL("clamping vblank bump to 1 on crtc %u: diffr=%u " > > > + "refcount %u, vblirq %u\n", pipe, diff, > > > + atomic_read(&vblank->refcount), > > > + (flags & DRM_CALLED_FROM_VBLIRQ) != 0); > > > + diff = 1; > > > + } > > > + > > > DRM_DEBUG_VBL("updating vblank count on crtc %u:" > > > " current=%u, diff=%u, hw=%u hw_last=%u\n", > > > pipe, vblank->count, diff, cur_vblank, vblank->last); > > > -- > > > 1.9.1 > > > > > > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > http://blog.ffwll.ch > > -- > Ville Syrjälä > Intel OTC -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch -- To unsubscribe from this list: send the line "unsubscribe stable" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html