This patchset contains TDR and Watchdog reset against a 3.13 drm-intel-nightly tree that is now about 2 weeks old. I have re-worked the TDR and Watchdog Reset features to integrate them more closely with the existing TDR and scoring mechanism. This is still a work-in-progress and I am currently debugging a couple of issues but I would like to get some early feedback. Thanks, Ian. >From f71a7de85e9d81be3aa3962c8fe2557235ff21c1 Mon Sep 17 00:00:00 2001 Message-Id: <cover.1387201899.git.ian.lister@xxxxxxxxx> From: ian-lister <ian.lister@xxxxxxxxx> Date: Mon, 16 Dec 2013 13:51:39 +0000 Subject: [RFC 00/13] TDR and Watchdog Reset This patchset adds support for per-engine timeout detection and recovery and adds batch specific watchdog reset. Per-ring TDR The detection logic has been modified to detect hangs on individual engines and pass this information through the to the recovery handler. Rather than a global reset it will attempt a per-engine reset. The registers associated with the ring are saved and restored so that when the ring restarts it continues from the next instruction in the ring. For example, if it was executing an MI_START_BATCH_BUFFER command it will advance to the next instruction which is likely to be the mailbox updates and user interrupt. This means that no extra effort is required to deal with synchronisation. From the perspective of the driver it looks like the batch buffer completed normally as all the normal signalling will take place, however the context stats will have been updated to flag up the guilty context. Watchdog Reset This is requested via flags to the batch buffer submission IOCTL. It is currently only supported for the render and video rings. The batch buffer command is surrounded by a hardware timer start command and stop command. If the batch completes before the timer expires then the timer is cancelled and no interrupt is generated so everything continues normally. However if the batch hangs then the timer will generate an interrupt and it will trigger an engine reset. This feature requires per-ring TDR to do the recovery work. ian-lister (13): drm/i915: Periodic sampling for hang detection drm/i915: Improved hang detection logic drm/i915: Additional ring operations for TDR drm/i915: Force wake restore for TDR drm/i915: Per-engine recovery drm/i915: Communicating reset requests drm/i915: Additional debug for TDR drm/i915: TDR loose ends drm/i915: Watchdog timer support functions drm/i915: MI_LOAD_REGISTER_IMM fix drm/i915: Added watchdog interrupt handling drm/i915: Enabled watchdog timer interrupts drm/i915: Exec buffer inserts watchdog commands drivers/gpu/drm/i915/i915_debugfs.c | 67 ++++ drivers/gpu/drm/i915/i915_dma.c | 3 + drivers/gpu/drm/i915/i915_drv.c | 46 +++ drivers/gpu/drm/i915/i915_drv.h | 41 ++- drivers/gpu/drm/i915/i915_gem.c | 26 +- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 30 +- drivers/gpu/drm/i915/i915_irq.c | 531 ++++++++++++++++++--------- drivers/gpu/drm/i915/i915_reg.h | 21 ++ drivers/gpu/drm/i915/intel_display.c | 30 +- drivers/gpu/drm/i915/intel_ringbuffer.c | 557 +++++++++++++++++++++++++++-- drivers/gpu/drm/i915/intel_ringbuffer.h | 53 +++ drivers/gpu/drm/i915/intel_uncore.c | 373 ++++++++++++++++++- include/drm/drmP.h | 7 + 13 files changed, 1575 insertions(+), 210 deletions(-) -- 1.8.5.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx