On Mon, Feb 27, 2017 at 02:06:58PM +0000, Chris Wilson wrote: > On Mon, Feb 27, 2017 at 01:57:35PM +0000, Tvrtko Ursulin wrote: > > > > On 27/02/2017 13:24, Chris Wilson wrote: > > > if (b->hangcheck_interrupts != atomic_read(&engine->irq_count)) { > > >@@ -67,7 +76,7 @@ static void intel_breadcrumbs_hangcheck(unsigned long data) > > > * to process the pending interrupt (e.g, low priority task on a loaded > > > * system) and wait until it sleeps before declaring a missed interrupt. > > > */ > > >- if (intel_engine_wakeup(engine) & ENGINE_WAKEUP_ACTIVE) { > > >+ if (!(intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP)) { > > > > I did not get the explanation from the previous round on why you had > > to reverse the active to asleep. Here it even looks wrong now, > > because I thought you don't want to re-queue the hangcheck when > > there are no waiters? > > No waiters: result = 0 > Running waiter: result = WAKEUP_WAITER > Sleeping waiter: result = WAKEUP_WAITER | WAKEUP_ASLEEP > > We only want to declare a mised irq if we wake up a sleeping waiter, and > keep the hangcheck timer running until the device is idle (when the irq > is disarmed). > > How about: > > if (intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP) { > set_bit(missed_irq); > mod_timer(&b->fake_irq, jiffies + 1); > } else { > mod_timer(b->hangcheck, wait_timeout); > } diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c index ebb7bc0be9fb..004eb4c0c531 100644 --- a/drivers/gpu/drm/i915/intel_breadcrumbs.c +++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c @@ -72,18 +72,25 @@ static void intel_breadcrumbs_hangcheck(unsigned long data) return; } - /* If the waiter was currently running, assume it hasn't had a chance + /* We keep the hangcheck time alive until we disarm the irq, even + * if there are no waiters at present. + * + * If the waiter was currently running, assume it hasn't had a chance * to process the pending interrupt (e.g, low priority task on a loaded * system) and wait until it sleeps before declaring a missed interrupt. + * + * If the waiter was asleep (and not even pending a wakeup), then we + * must have missed an interrupt as the GPU has stopped advancing + * but we still have a waiter. Assuming all batches complete within + * DRM_I915_HANGCHECK_JIFFIES [1.5s]! */ - if (!(intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP)) { + if (intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP) { + DRM_DEBUG("Hangcheck timer elapsed... %s idle\n", engine->name); + set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings); + mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1); + } else { mod_timer(&b->hangcheck, wait_timeout()); - return; } - - DRM_DEBUG("Hangcheck timer elapsed... %s idle\n", engine->name); - set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings); - mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1); } static void intel_breadcrumbs_fake_irq(unsigned long data) @@ -167,6 +174,10 @@ void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine) spin_lock_irqsave(&b->lock, flags); + /* We only disarm the irq when we are idle (all requests completed), + * so if there remains a sleeping waiter, it missed the request + * completion. + */ if (__intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP) set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings); -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx