On 01/07/16 12:22, Chris Wilson wrote:
We can forgo queuing the hangcheck from the start of every request to until we wait upon a request. This reduces the overhead of every request, but may increase the latency of detecting a hang. Howeever, if nothing every waits upon a hang, did it ever hang? It also improves the robustness of the wait-request by ensuring that the hangchecker is indeed running before we sleep indefinitely (and thereby ensuring that we never actually sleep forever waiting for a dead GPU). Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> --- drivers/gpu/drm/i915/i915_gem.c | 9 +++++---- drivers/gpu/drm/i915/i915_irq.c | 10 ++++------ 2 files changed, 9 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 1d9878258103..34f724cc40b8 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -1532,6 +1532,9 @@ int __i915_wait_request(struct drm_i915_gem_request *req, break; } + /* Ensure that even if the GPU hangs, we get woken up. */ + i915_queue_hangcheck(dev_priv); + timer.function = NULL; if (timeout || missed_irq(dev_priv, engine)) { unsigned long expire; @@ -2919,8 +2922,6 @@ void __i915_add_request(struct drm_i915_gem_request *request, /* Not allowed to fail! */ WARN(ret, "emit|add_request failed: %d!\n", ret); - i915_queue_hangcheck(engine->i915); - queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, round_jiffies_up_relative(HZ)); @@ -3264,8 +3265,8 @@ i915_gem_retire_requests(struct drm_i915_private *dev_priv) if (idle) mod_delayed_work(dev_priv->wq, - &dev_priv->mm.idle_work, - msecs_to_jiffies(100)); + &dev_priv->mm.idle_work, + msecs_to_jiffies(100)); return idle; } diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c index 4378a659d962..5614582ca240 100644 --- a/drivers/gpu/drm/i915/i915_irq.c +++ b/drivers/gpu/drm/i915/i915_irq.c @@ -3135,10 +3135,10 @@ static void i915_hangcheck_elapsed(struct work_struct *work) intel_uncore_arm_unclaimed_mmio_detection(dev_priv); for_each_engine_id(engine, dev_priv, id) { + bool busy = waitqueue_active(&engine->irq_queue); u64 acthd; u32 seqno; unsigned user_interrupts; - bool busy = true; semaphore_clear_deadlocks(dev_priv); @@ -3161,12 +3161,11 @@ static void i915_hangcheck_elapsed(struct work_struct *work) if (engine->hangcheck.seqno == seqno) { if (ring_idle(engine, seqno)) { engine->hangcheck.action = HANGCHECK_IDLE; - if (waitqueue_active(&engine->irq_queue)) {, the + if (busy) { /* Safeguard against driver failure */ user_interrupts = kick_waiters(engine); engine->hangcheck.score += BUSY; - } else - busy = false; + } } else { /* We always increment the hangcheck score * if the ring is busy and still processing @@ -3240,9 +3239,8 @@ static void i915_hangcheck_elapsed(struct work_struct *work) goto out; } + /* Reset timer in case GPU hangs without another request being added */ if (busy_count) - /* Reset timer case chip hangs without another request - * being added */ i915_queue_hangcheck(dev_priv); out:
I thought I see a problem here but I was just confused. I think it is OK. Just won't re-queue the hangcheck if no one is waiting and no new requests get submitted. It is unlikely that would cause a problem in practice. It sounds very unlucky that the last submitted request ever hangs. Balance with the benefit of not running while GPU is processing stuff I think we can give it a go.
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx> Regards, Tvrtko _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx