Re: Funky new vblank counter regressions in Linux 4.4-rc1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 11/25/2015 08:36 PM, Ville Syrjälä wrote:
On Wed, Nov 25, 2015 at 08:04:26PM +0100, Mario Kleiner wrote:
On 11/25/2015 06:46 PM, Ville Syrjälä wrote:

...

Attached is my current patch i wanted to submit for the drm core's
drm_update_vblank_count(). I think it's good to make the core somewhat
robust against potential kms driver bugs or glitches. But if you
wouldn't like that patch, there wouldn't be much of a point sending it
out at all.

thanks,
-mario


>From 2d5d58a1c575ad002ce2cb643f395d0e4757d959 Mon Sep 17 00:00:00 2001
From: Mario Kleiner <mario.kleiner.de@xxxxxxxxx>
Date: Wed, 25 Nov 2015 18:48:31 +0100
Subject: [PATCH] drm/irq: Make drm_update_vblank_count() more robust.

The changes to drm_update_vblank_count() for Linux 4.4-rc
made the function more fragile wrt. some hw quirks. E.g.,
at dev->driver->enable_vblank(), AMD gpu's fire a spurious
redundant vblank irq shortly after enabling vblank irqs, not
locked to vblank. This causes a redundant call which needs
to be suppressed to avoid miscounting.

To increase robustness, shuffle things around a bit:

On drivers with high precision vblank timestamping always
evaluate the timestamp difference between current timestamp
and previously recorded timestamp to detect such redundant
invocations and no-op in that case.

Also detect and warn about timestamps going backwards to
catch potential kms driver bugs.

This patch is meant for Linux 4.4-rc and later.

Signed-off-by: Mario Kleiner <mario.kleiner.de@xxxxxxxxx>
---
  drivers/gpu/drm/drm_irq.c | 53 ++++++++++++++++++++++++++++++++++++-----------
  1 file changed, 41 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index 819b8c1..8728c3c 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -172,9 +172,11 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
  				    unsigned long flags)
  {
  	struct drm_vblank_crtc *vblank = &dev->vblank[pipe];
-	u32 cur_vblank, diff;
+	u32 cur_vblank, diff = 0;
  	bool rc;
  	struct timeval t_vblank;
+	const struct timeval *t_old;
+	u64 diff_ns;
  	int count = DRM_TIMESTAMP_MAXRETRIES;
  	int framedur_ns = vblank->framedur_ns;

@@ -195,13 +197,15 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
  		rc = drm_get_last_vbltimestamp(dev, pipe, &t_vblank, flags);
  	} while (cur_vblank != dev->driver->get_vblank_counter(dev, pipe) && --count > 0);

-	if (dev->max_vblank_count != 0) {
-		/* trust the hw counter when it's around */
-		diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
-	} else if (rc && framedur_ns) {
-		const struct timeval *t_old;
-		u64 diff_ns;
-
+	/*
+	 * Always use vblank timestamping based method if supported to reject
+	 * redundant vblank irqs. E.g., AMD hardware needs this to not screw up
+	 * due to some irq handling quirk.
+	 *
+	 * This also sets the diff value for use as fallback below in case the
+	 * hw does not support a suitable hw vblank counter.
+	 */
+	if (rc && framedur_ns) {

If you fudged everything properly why do you still need this? With
working hw counter there should be no need to do this stuff.


As far as testing on one DCE4 card goes, i don't need it anymore with my fudged hw counters and timestamps. The fudging so far seems to work nicely. I just wanted to have a bit of extra robustness and a bit of extra available debug output against future or other broken drivers, or mistakes in fudging on the current driver, e.g., against things like timestamps going backwards. Especially since i can only test on two AMD cards atm., quite a limited sample. There are 3 display engine generations before and 5 generations after my test sample.

-mario


  		t_old = &vblanktimestamp(dev, pipe, vblank->count);
  		diff_ns = timeval_to_ns(&t_vblank) - timeval_to_ns(t_old);

@@ -212,11 +216,36 @@ static void drm_update_vblank_count(struct drm_device *dev, unsigned int pipe,
  		 */
  		diff = DIV_ROUND_CLOSEST_ULL(diff_ns, framedur_ns);

-		if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ)
+		/* Catch driver timestamping bugs and prevent worse. */
+		if ((s32) diff < 0) {
+			DRM_DEBUG_VBL("crtc %u: Timestamp going backward!"
+			" diff_ns = %lld, framedur_ns = %d)\n",
+			pipe, (long long) diff_ns, framedur_ns);
+			return;
+		}
+
+		if (diff == 0 && flags & DRM_CALLED_FROM_VBLIRQ) {
  			DRM_DEBUG_VBL("crtc %u: Redundant vblirq ignored."
-				      " diff_ns = %lld, framedur_ns = %d)\n",
-				      pipe, (long long) diff_ns, framedur_ns);
-	} else {
+			" diff_ns = %lld, framedur_ns = %d)\n",
+			pipe, (long long) diff_ns, framedur_ns);
+
+			/* Treat this redundant invocation as no-op. */
+			WARN_ON_ONCE(cur_vblank != vblank->last);
+			return;
+		}
+	}
+
+	/*
+	 * hw counters, if marked as supported via max_vblank_count != 0,
+	 * *must* increment at leading edge of vblank or at least before
+	 * the firing of the hw vblank irq, otherwise we have a race here
+	 * between gpu and us, where small variations in our execution
+	 * timing can cause off-by-one miscounting of vblanks.
+	 */
+	if (dev->max_vblank_count != 0) {
+		/* trust the hw counter when it's around */
+		diff = (cur_vblank - vblank->last) & dev->max_vblank_count;
+	} else if (!(rc && framedur_ns)) {
  		/* some kind of default for drivers w/o accurate vbl timestamping */
  		diff = (flags & DRM_CALLED_FROM_VBLIRQ) != 0;
  	}
--
1.9.1



_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/dri-devel




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux