On 27/02/2017 14:13, Chris Wilson wrote:
On Mon, Feb 27, 2017 at 02:06:58PM +0000, Chris Wilson wrote:
On Mon, Feb 27, 2017 at 01:57:35PM +0000, Tvrtko Ursulin wrote:
On 27/02/2017 13:24, Chris Wilson wrote:
if (b->hangcheck_interrupts != atomic_read(&engine->irq_count)) {
@@ -67,7 +76,7 @@ static void intel_breadcrumbs_hangcheck(unsigned long data)
* to process the pending interrupt (e.g, low priority task on a loaded
* system) and wait until it sleeps before declaring a missed interrupt.
*/
- if (intel_engine_wakeup(engine) & ENGINE_WAKEUP_ACTIVE) {
+ if (!(intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP)) {
I did not get the explanation from the previous round on why you had
to reverse the active to asleep. Here it even looks wrong now,
because I thought you don't want to re-queue the hangcheck when
there are no waiters?
No waiters: result = 0
Running waiter: result = WAKEUP_WAITER
Sleeping waiter: result = WAKEUP_WAITER | WAKEUP_ASLEEP
We only want to declare a mised irq if we wake up a sleeping waiter, and
keep the hangcheck timer running until the device is idle (when the irq
is disarmed).
How about:
if (intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP) {
set_bit(missed_irq);
mod_timer(&b->fake_irq, jiffies + 1);
} else {
mod_timer(b->hangcheck, wait_timeout);
}
diff --git a/drivers/gpu/drm/i915/intel_breadcrumbs.c b/drivers/gpu/drm/i915/intel_breadcrumbs.c
index ebb7bc0be9fb..004eb4c0c531 100644
--- a/drivers/gpu/drm/i915/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/intel_breadcrumbs.c
@@ -72,18 +72,25 @@ static void intel_breadcrumbs_hangcheck(unsigned long data)
return;
}
- /* If the waiter was currently running, assume it hasn't had a chance
+ /* We keep the hangcheck time alive until we disarm the irq, even
+ * if there are no waiters at present.
+ *
+ * If the waiter was currently running, assume it hasn't had a chance
* to process the pending interrupt (e.g, low priority task on a loaded
* system) and wait until it sleeps before declaring a missed interrupt.
+ *
+ * If the waiter was asleep (and not even pending a wakeup), then we
+ * must have missed an interrupt as the GPU has stopped advancing
+ * but we still have a waiter. Assuming all batches complete within
+ * DRM_I915_HANGCHECK_JIFFIES [1.5s]!
*/
- if (!(intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP)) {
+ if (intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP) {
+ DRM_DEBUG("Hangcheck timer elapsed... %s idle\n", engine->name);
+ set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
+ mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
+ } else {
mod_timer(&b->hangcheck, wait_timeout());
- return;
}
-
- DRM_DEBUG("Hangcheck timer elapsed... %s idle\n", engine->name);
- set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
- mod_timer(&engine->breadcrumbs.fake_irq, jiffies + 1);
}
static void intel_breadcrumbs_fake_irq(unsigned long data)
@@ -167,6 +174,10 @@ void intel_engine_disarm_breadcrumbs(struct intel_engine_cs *engine)
spin_lock_irqsave(&b->lock, flags);
+ /* We only disarm the irq when we are idle (all requests completed),
+ * so if there remains a sleeping waiter, it missed the request
+ * completion.
+ */
if (__intel_engine_wakeup(engine) & ENGINE_WAKEUP_ASLEEP)
set_bit(engine->id, &engine->i915->gpu_error.missed_irq_rings);
Looks okay after some re-re-re-reading. I guess shuffling things back
and forth in this series, not long after the area was recently changed
as well, was a bit challenging.
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@xxxxxxxxx>
Regards,
Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx