Re: [PATCH 02/20] drm/i915: Delay queuing hangcheck to wait-request

Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxxxxxxxx> · Fri, 1 Jul 2016 16:32:16 +0100

On 01/07/16 12:22, Chris Wilson wrote:
We can forgo queuing the hangcheck from the start of every request to
until we wait upon a request. This reduces the overhead of every
request, but may increase the latency of detecting a hang. Howeever, if
nothing every waits upon a hang, did it ever hang? It also improves the
robustness of the wait-request by ensuring that the hangchecker is
indeed running before we sleep indefinitely (and thereby ensuring that
we never actually sleep forever waiting for a dead GPU).

Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
---
  drivers/gpu/drm/i915/i915_gem.c |  9 +++++----
  drivers/gpu/drm/i915/i915_irq.c | 10 ++++------
  2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1d9878258103..34f724cc40b8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1532,6 +1532,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
  			break;
  		}

+		/* Ensure that even if the GPU hangs, we get woken up. */
+		i915_queue_hangcheck(dev_priv);
+
  		timer.function = NULL;
  		if (timeout || missed_irq(dev_priv, engine)) {
  			unsigned long expire;
@@ -2919,8 +2922,6 @@ void __i915_add_request(struct drm_i915_gem_request *request,
  	/* Not allowed to fail! */
  	WARN(ret, "emit|add_request failed: %d!\n", ret);

-	i915_queue_hangcheck(engine->i915);
-
  	queue_delayed_work(dev_priv->wq,
  			   &dev_priv->mm.retire_work,
  			   round_jiffies_up_relative(HZ));
@@ -3264,8 +3265,8 @@ i915_gem_retire_requests(struct drm_i915_private *dev_priv)

  	if (idle)
  		mod_delayed_work(dev_priv->wq,
-				   &dev_priv->mm.idle_work,
-				   msecs_to_jiffies(100));
+				 &dev_priv->mm.idle_work,
+				 msecs_to_jiffies(100));

  	return idle;
  }
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 4378a659d962..5614582ca240 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -3135,10 +3135,10 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
  	intel_uncore_arm_unclaimed_mmio_detection(dev_priv);

  	for_each_engine_id(engine, dev_priv, id) {
+		bool busy = waitqueue_active(&engine->irq_queue);
  		u64 acthd;
  		u32 seqno;
  		unsigned user_interrupts;
-		bool busy = true;

  		semaphore_clear_deadlocks(dev_priv);

@@ -3161,12 +3161,11 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
  		if (engine->hangcheck.seqno == seqno) {
  			if (ring_idle(engine, seqno)) {
  				engine->hangcheck.action = HANGCHECK_IDLE;
-				if (waitqueue_active(&engine->irq_queue)) {, the
+				if (busy) {
  					/* Safeguard against driver failure */
  					user_interrupts = kick_waiters(engine);
  					engine->hangcheck.score += BUSY;
-				} else
-					busy = false;
+				}
  			} else {
  				/* We always increment the hangcheck score
  				 * if the ring is busy and still processing
@@ -3240,9 +3239,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
  		goto out;
  	}

+	/* Reset timer in case GPU hangs without another request being added */
  	if (busy_count)
-		/* Reset timer case chip hangs without another request
-		 * being added */
  		i915_queue_hangcheck(dev_priv);

  out:


I thought I see a problem here but I was just confused. I think it is 
OK. Just won't re-queue the hangcheck if no one is waiting and no new 
requests get submitted. It is unlikely that would cause a problem in 
practice. It sounds very unlucky that the last submitted request ever 
hangs. Balance with the benefit of not running while GPU is processing 
stuff I think we can give it a go.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx