Re: [PATCH 03/20] drm/i915: TDR / per-engine hang recovery support for gen8.

Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> · Fri, 29 Jan 2016 16:16:29 +0200

Arun Siluvery <arun.siluvery@xxxxxxxxxxxxxxx> writes:

> From: Tomas Elf <tomas.elf@xxxxxxxxx>
>
> TDR = Timeout Detection and Recovery.
>
> This change introduces support for TDR-style per-engine reset as an initial,
> less intrusive hang recovery option to be attempted before falling back to the
> legacy full GPU reset recovery mode if necessary. Initially we're only
> supporting gen8 but adding support for gen7 is straight-forward since we've
> already established an extensible framework where gen7 support can be plugged
> in (add corresponding versions of intel_ring_enable, intel_ring_disable,
> intel_ring_save, intel_ring_restore, etc.).
>
> 1. Per-engine recovery vs. Full GPU recovery
>
> To capture the state of a single engine being detected as hung there is now a
> new flag for every engine that can be set once the decision has been made to
> schedule hang recovery for that particular engine. This patch only provides the
> hang recovery path but not the hang detection integration so for now there is
> no way of detecting individual engines as hung and targetting that individual
> engine for per-engine hang recovery.
>
> The following algorithm is used to determine when to use which recovery mode
> given that hang detection has somehow detected a hang on an individual engine
> and given that per-engine hang recovery has been enabled (which it by default
> is not):
>
> 	1. The error handler checks all engines that have been marked as hung
> 	by the hang checker and checks how long ago it was since it last
> 	attempted to do per-engine hang recovery for each respective, currently
> 	hung engine. If the measured time period is within a certain time
> 	window, i.e. the last per-engine hang recovery was done too recently,
> 	it is determined that the previously attempted per-engine hang recovery
> 	was ineffective and the step is taken to promote the current hang to a
> 	full GPU reset. The default value for this time window is 10 seconds,
> 	meaning any hang happening within 10 seconds of a previous hang on the
> 	same engine will be promoted to full GPU reset. (of course, as long as
> 	the per-engine hang recovery option is disabled this won't matter and
> 	the error handler will always go for legacy full GPU reset)
>
> 	2. If the error handler determines that no currently hung engine has
> 	recently had hang recovery a per-engine hang recovery is scheduled.
>
> 	3. If the decision to go with per-engine hang recovery is not taken, or
> 	if per-engine hang recovery is attempted but failed for whatever
> 	reason, TDR falls back to legacy full GPU recovery.
>
> NOTE: Gen7 and earlier will always promote to full GPU reset since there is
> currently no per-engine reset support for these gens.
>
> 2. Context Submission Status Consistency.
>
> Per-engine hang recovery on gen8 (or execlist submission mode in general)
> relies on the basic concept of context submission status consistency. What this
> means is that we make sure that the status of the hardware and the driver when
> it comes to the submission of the currently running context on any engine is
> consistent. For example, when submitting a context to the corresponding ELSP
> port of an engine we expect the owning request of that context to be at the
> head of the corresponding execution list queue. Likewise, as long as the
> context is executing on the GPU we expect the EXECLIST_STATUS register and the
> context status buffer (CSB) to reflect this. Thus, if the context submission
> status is consistent the ID of the currently executing context should be in
> EXECLIST_STATUS and it should be consistent with the context of the head
> request element in the execution list queue corresponding to that engine.
>
> The reason why this is important for per-engine hang recovery in execlist mode
> is because this recovery mode relies on context resubmission in order to resume
> execution following the recovery. If a context has been determined to be hung
> and the per-engine hang recovery mode is engaged leading to the resubmission of
> that context it's important that the hardware is in fact not busy doing
> something else or is being idle since a resubmission during this state could
> cause unforseen side-effects such as unexpected preemptions.
>
> There are rare, although consistently reproducable, situations that have shown
> up in practice where the driver and hardware are no longer consistent with each
> other, e.g. due to lost context completion interrupts after which the hardware
> would be idle but the driver would still think that a context would still be
> active.
>
> 3. There is a new reset path for engine reset alongside the legacy full GPU
> reset path. This path does the following:
>
> 	1) Check for context submission consistency to make sure that the
> 	context that the hardware is currently stuck on is actually what the
> 	driver is working on. If not then clearly we're not in a consistently
> 	hung state and we bail out early.
>
> 	2) Disable/idle the engine. This is done through reset handshaking on
> 	gen8+ unlike earlier gens where this was done by clearing the ring
> 	valid bits in MI_MODE and ring control registers, which are no longer
> 	supported on gen8+. Reset handshaking translates to setting the reset
> 	request bit in the reset control register.
>
> 	3) Save the current engine state. What this translates to on gen8 is
> 	simply to read the current value of the head register and nudge it so
> 	that it points to the next valid instruction in the ring buffer. Since
> 	we assume that the execution is currently stuck in a batch buffer the
> 	effect of this is that the batchbuffer start instruction of the hung
> 	batch buffer is skipped so that when execution resumes, following the
> 	hang recovery completion, it resumes immediately following the batch
> 	buffer.
>
> 	This effectively means that we're forcefully terminating the currently
> 	active, hung batch buffer. Obviously, the outcome of this intervention
> 	is potentially undefined but there are not many good options in this
> 	scenario. It's better than resetting the entire GPU in the vast
> 	majority of cases.
>
> 	Save the nudged head value to be applied later.
>
> 	4) Reset the engine.
>
> 	5) Apply the nudged head value to the head register.
>
> 	6) Reenable the engine. For gen8 this means resubmitting the fixed-up
> 	context, allowing execution to resume. In order to resubmit a context
> 	without relying on the currently hung execlist queue we use a new,
> 	privileged API that is dedicated to TDR use only. This submission API
> 	bypasses any currently queued work and gets exclusive access to the
> 	ELSP ports.
>
> 	7) If the engine hang recovery procedure fails at any point in between
> 	disablement and reenablement of the engine there is a back-off
> 	procedure: For gen8 it's possible to back out of the reset handshake by
> 	clearing the reset request bit in the reset control register.
>
> NOTE:
> It's possible that some of Ben Widawsky's original per-engine reset patches
> from 3 years ago are in this commit but since this work has gone through the
> hands of at least 3 people already any kind of ownership tracking has been lost
> a long time ago. If you think that you should be on the sob list just let me
> know.
>
> * RFCv2: (Chris Wilson / Daniel Vetter)
> - Simply use the previously private function i915_gem_reset_ring_status() from
>   the engine hang recovery path to set active/pending context status. This
>   replicates the same behaviour as in full GPU reset but for a single,
>   targetted engine.
>
> - Remove all additional uevents for both full GPU reset and per-engine reset.
>   Adapted uevent behaviour to the new per-engine hang recovery mode in that it
>   will only send one uevent regardless of which form of recovery is employed.
>   If a per-engine reset is attempted first then one uevent will be dispatched.
>   If that recovery mode fails and the hang is promoted to a full GPU reset no
>   further uevents will be dispatched at that point.
>
> - Tidied up the TDR context resubmission path in intel_lrc.c . Reduced the
>   amount of duplication by relying entirely on the normal unqueue function.
>   Added a new parameter to the unqueue function that takes into consideration
>   if the unqueue call is for a first-time context submission or a resubmission
>   and adapts the handling of elsp_submitted accordingly. The reason for
>   this is that for context resubmission we don't expect any further
>   interrupts for the submission or the following context completion. A more
>   elegant way of handling this would be to phase out elsp_submitted
>   altogether, however that's part of a LRC/execlist cleanup effort that is
>   happening independently of this patch series. For now we make this change
>   as simple as possible with as few non-TDR-related side-effects as
>   possible.
>

> Signed-off-by: Tomas Elf <tomas.elf@xxxxxxxxx>
> Signed-off-by: Ian Lister <ian.lister@xxxxxxxxx>
> Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx>
> Signed-off-by: Arun Siluvery <arun.siluvery@xxxxxxxxxxxxxxx>
> ---
>  drivers/gpu/drm/i915/i915_dma.c         |  18 +
>  drivers/gpu/drm/i915/i915_drv.c         | 206 ++++++++++++
>  drivers/gpu/drm/i915/i915_drv.h         |  58 ++++
>  drivers/gpu/drm/i915/i915_irq.c         | 169 +++++++++-
>  drivers/gpu/drm/i915/i915_params.c      |  19 ++
>  drivers/gpu/drm/i915/i915_params.h      |   2 +
>  drivers/gpu/drm/i915/i915_reg.h         |   2 +
>  drivers/gpu/drm/i915/intel_lrc.c        | 565 +++++++++++++++++++++++++++++++-
>  drivers/gpu/drm/i915/intel_lrc.h        |  14 +
>  drivers/gpu/drm/i915/intel_lrc_tdr.h    |  36 ++
>  drivers/gpu/drm/i915/intel_ringbuffer.c |  84 ++++-
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  64 ++++
>  drivers/gpu/drm/i915/intel_uncore.c     | 147 +++++++++
>  13 files changed, 1358 insertions(+), 26 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/intel_lrc_tdr.h
>

1332 lines of new code in a single patch. We need to figure
out how to split this.

The context register write/read code and related macros are
not needed anymore so that will lessen the lines alot.

But some random comments for round two inlined below...

> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 44a896c..c45ec353 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -837,6 +837,22 @@ static void intel_device_info_runtime_init(struct drm_device *dev)
>  			 info->has_eu_pg ? "y" : "n");
>  }
>  
> +static void
> +i915_hangcheck_init(struct drm_device *dev)
> +{
> +	int i;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +	for (i = 0; i < I915_NUM_RINGS; i++) {
> +		struct intel_engine_cs *engine = &dev_priv->ring[i];
> +		struct intel_ring_hangcheck *hc = &engine->hangcheck;
> +
> +		i915_hangcheck_reinit(engine);

intel_engine_init_hangcheck(engine);

> +		hc->reset_count = 0;
> +		hc->tdr_count = 0;
> +	}
> +}
> +
>  static void intel_init_dpio(struct drm_i915_private *dev_priv)
>  {
>  	/*
> @@ -1034,6 +1050,8 @@ int i915_driver_load(struct drm_device *dev, unsigned long flags)
>  
>  	i915_gem_load(dev);
>  
> +	i915_hangcheck_init(dev);
> +
>  	/* On the 945G/GM, the chipset reports the MSI capability on the
>  	 * integrated graphics even though the support isn't actually there
>  	 * according to the published specs.  It doesn't appear to function
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index f17a2b0..c0ad003 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -34,6 +34,7 @@
>  #include "i915_drv.h"
>  #include "i915_trace.h"
>  #include "intel_drv.h"
> +#include "intel_lrc_tdr.h"

We want to push pre gen 8 stuff also to here, atleast
eventually. So

#include "intel_tdr.h"

>  
>  #include <linux/console.h>
>  #include <linux/module.h>
> @@ -571,6 +572,7 @@ static int i915_drm_suspend(struct drm_device *dev)
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	pci_power_t opregion_target_state;
>  	int error;
> +	int i;
>  
>  	/* ignore lid events during suspend */
>  	mutex_lock(&dev_priv->modeset_restore_lock);
> @@ -596,6 +598,16 @@ static int i915_drm_suspend(struct drm_device *dev)
>  
>  	intel_guc_suspend(dev);
>  
> +	/*
> +	 * Clear any pending reset requests. They should be picked up
> +	 * after resume when new work is submitted
> +	 */
> +	for (i = 0; i < I915_NUM_RINGS; i++)
> +		atomic_set(&dev_priv->ring[i].hangcheck.flags, 0);

This will cause havoc if you ever expand the flag space. If
the comment says that you want to clear pending resets, then
clear it with mask.

> +
> +	atomic_clear_mask(I915_RESET_IN_PROGRESS_FLAG,
> +		&dev_priv->gpu_error.reset_counter);
> +
>  	intel_suspend_gt_powersave(dev);
>  
>  	/*
> @@ -948,6 +960,200 @@ int i915_reset(struct drm_device *dev)
>  	return 0;
>  }
>  
> +/**
> + * i915_reset_engine - reset GPU engine after a hang
> + * @engine: engine to reset
> + *
> + * Reset a specific GPU engine. Useful if a hang is detected. Returns zero on successful
> + * reset or otherwise an error code.
> + *
> + * Procedure is fairly simple:
> + *
> + *	- Force engine to idle.
> + *
> + *	- Save current head register value and nudge it past the point of the hang in the
> + *	  ring buffer, which is typically the BB_START instruction of the hung batch buffer,
> + *	  on to the following instruction.
> + *
> + *	- Reset engine.
> + *
> + *	- Restore the previously saved, nudged head register value.
> + *
> + *	- Re-enable engine to resume running. On gen8 this requires the previously hung
> + *	  context to be resubmitted to ELSP via the dedicated TDR-execlists interface.
> + *
> + */
> +int i915_reset_engine(struct intel_engine_cs *engine)
> +{
> +	struct drm_device *dev = engine->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct drm_i915_gem_request *current_request = NULL;
> +	uint32_t head;
> +	bool force_advance = false;
> +	int ret = 0;
> +	int err_ret = 0;
> +
> +	WARN_ON(!mutex_is_locked(&dev->struct_mutex));
> +
> +        /* Take wake lock to prevent power saving mode */
> +	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
> +
> +	i915_gem_reset_ring_status(dev_priv, engine);
> 
> +	if (i915.enable_execlists) {
> +		enum context_submission_status status =
> +			intel_execlists_TDR_get_current_request(engine, NULL);
> +
> +		/*
> +		 * If the context submission state in hardware is not
> +		 * consistent with the the corresponding state in the driver or
> +		 * if there for some reason is no current context in the
> +		 * process of being submitted then bail out and try again. Do
> +		 * not proceed unless we have reliable current context state
> +		 * information. The reason why this is important is because
> +		 * per-engine hang recovery relies on context resubmission in
> +		 * order to force the execution to resume following the hung
> +		 * batch buffer. If the hardware is not currently running the
> +		 * same context as the driver thinks is hung then anything can
> +		 * happen at the point of context resubmission, e.g. unexpected
> +		 * preemptions or the previously hung context could be
> +		 * submitted when the hardware is idle which makes no sense.
> +		 */
> +		if (status != CONTEXT_SUBMISSION_STATUS_OK) {
> +			ret = -EAGAIN;
> +			goto reset_engine_error;
> +		}
> +	}

This whole ambivalence troubles me. If our hangcheck part is lacking so
that it will reset engines that really are not stuck, then we should
move/improve this logic in hangcheck side.

We are juggling here with the the execlist lock inside the 
intel_execlist_TDR_get_current_request and on multiple calls to that.

We need to hold the execlist lock during the state save and
restore.

> +
> +	ret = intel_ring_disable(engine);
> +	if (ret != 0) {
> +		DRM_ERROR("Failed to disable %s\n", engine->name);
> +		goto reset_engine_error;
> +	}
> +
> +	if (i915.enable_execlists) {
> +		enum context_submission_status status;
> +		bool inconsistent;
> +
> +		status = intel_execlists_TDR_get_current_request(engine,
> +				&current_request);
> +

intel_execlist_get_current_request()
intel_execlist_get_submission_status()

if we have lock, no need to do everything in same function.

And move the referencing of current_request up to this context as
the unreferencing is already here.

> +		inconsistent = (status != CONTEXT_SUBMISSION_STATUS_OK);
> +		if (inconsistent) {
> +			/*
> +			 * If we somehow have reached this point with
> +			 * an inconsistent context submission status then
> +			 * back out of the previously requested reset and
> +			 * retry later.
> +			 */
> +			WARN(inconsistent,
> +			     "Inconsistent context status on %s: %u\n",
> +			     engine->name, status);
> +
> +			ret = -EAGAIN;
> +			goto reenable_reset_engine_error;
> +		}
> +	}
> +
> +	/* Sample the current ring head position */
> +	head = I915_READ_HEAD(engine) & HEAD_ADDR;

intel_ring_get_active_head(engine);

> +
> +	if (head == engine->hangcheck.last_head) {
> +		/*
> +		 * The engine has not advanced since the last
> +		 * time it hung so force it to advance to the
> +		 * next QWORD. In most cases the engine head
> +		 * pointer will automatically advance to the
> +		 * next instruction as soon as it has read the
> +		 * current instruction, without waiting for it
> +		 * to complete. This seems to be the default
> +		 * behaviour, however an MBOX wait inserted
> +		 * directly to the VCS/BCS engines does not behave
> +		 * in the same way, instead the head pointer
> +		 * will still be pointing at the MBOX instruction
> +		 * until it completes.
> +		 */
> +		force_advance = true;
> +	}
> +
> +	engine->hangcheck.last_head = head;
> +
> +	ret = intel_ring_save(engine, current_request, force_advance);

intel_engine_save()

> +	if (ret) {
> +		DRM_ERROR("Failed to save %s engine state\n", engine->name);
> +		goto reenable_reset_engine_error;
> +	}
> +
> +	ret = intel_gpu_engine_reset(engine);

intel_engine_reset()

> +	if (ret) {
> +		DRM_ERROR("Failed to reset %s\n", engine->name);
> +		goto reenable_reset_engine_error;
> +	}
> +
> +	ret = intel_ring_restore(engine, current_request);

intel_engine_restore()

> +	if (ret) {
> +		DRM_ERROR("Failed to restore %s engine state\n", engine->name);
> +		goto reenable_reset_engine_error;
> +	}
> +
> +	/* Correct driver state */
> +	intel_gpu_engine_reset_resample(engine, current_request);

This looks like it resamples the head.

intel_engine_reset_head()

> +
> +	/*
> +	 * Reenable engine
> +	 *
> +	 * In execlist mode on gen8+ this is implicit by simply resubmitting
> +	 * the previously hung context. In ring buffer submission mode on gen7
> +	 * and earlier we need to actively turn on the engine first.
> +	 */
> +	if (i915.enable_execlists)
> +		intel_execlists_TDR_context_resubmission(engine);

intel_logical_ring_enable()?

> +	else
> +		ret = intel_ring_enable(engine);
> +

> +	if (ret) {
> +		DRM_ERROR("Failed to enable %s again after reset\n",
> +			engine->name);
> +
> +		goto reset_engine_error;
> +	}
> +
> +	/* Clear reset flags to allow future hangchecks */
> +	atomic_set(&engine->hangcheck.flags, 0);
> +
> +	/* Wake up anything waiting on this engine's queue */
> +	wake_up_all(&engine->irq_queue);
> +
> +	if (i915.enable_execlists && current_request)
> +		i915_gem_request_unreference(current_request);
> +
> +	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> +

reset_engine_error: is identical to code block above.

> +	return ret;
> +
> +reenable_reset_engine_error:
> +
> +	err_ret = intel_ring_enable(engine);
> +	if (err_ret)
> +		DRM_ERROR("Failed to reenable %s following error during reset (%d)\n",
> +			engine->name, err_ret);
> +
> +reset_engine_error:
> +
> +	/* Clear reset flags to allow future hangchecks */
> +	atomic_set(&engine->hangcheck.flags, 0);
> +
> +	/* Wake up anything waiting on this engine's queue */
> +	wake_up_all(&engine->irq_queue);
> +
> +	if (i915.enable_execlists && current_request)
> +		i915_gem_request_unreference(current_request);
> +
> +	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> +
> +	return ret;
> +}
> +
>  static int i915_pci_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
>  {
>  	struct intel_device_info *intel_info =
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 703a320..e866f14 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2432,6 +2432,48 @@ struct drm_i915_cmd_table {
>  	int count;
>  };
>  
> +/*
> + * Context submission status
> + *
> + * CONTEXT_SUBMISSION_STATUS_OK:
> + *	Context submitted to ELSP and state of execlist queue is the same as
> + *	the state of EXECLIST_STATUS register. Software and hardware states
> + *	are consistent and can be trusted.
> + *
> + * CONTEXT_SUBMISSION_STATUS_INCONSISTENT:
> + *	Context has been submitted to the execlist queue but the state of the
> + *	EXECLIST_STATUS register is different from the execlist queue state.
> + *	This could mean any of the following:
> + *
> + *		1. The context is in the head position of the execlist queue
> + *		   but has not yet been submitted to ELSP.
> + *
> + *		2. The hardware just recently completed the context but the
> + *		   context is pending removal from the execlist queue.
> + *
> + *		3. The driver has lost a context state transition interrupt.
> + *		   Typically what this means is that hardware has completed and
> + *		   is now idle but the driver thinks the hardware is still
> + *		   busy.
> + *
> + *	Overall what this means is that the context submission status is
> + *	currently in transition and cannot be trusted until it settles down.
> + *
> + * CONTEXT_SUBMISSION_STATUS_NONE_SUBMITTED:
> + *	No context submitted to the execlist queue and the EXECLIST_STATUS
> + *	register shows no context being processed.
> + *
> + * CONTEXT_SUBMISSION_STATUS_NONE_UNDEFINED:
> + *	Initial state before submission status has been determined.
> + *
> + */
> +enum context_submission_status {
> +	CONTEXT_SUBMISSION_STATUS_OK = 0,
> +	CONTEXT_SUBMISSION_STATUS_INCONSISTENT,
> +	CONTEXT_SUBMISSION_STATUS_NONE_SUBMITTED,
> +	CONTEXT_SUBMISSION_STATUS_UNDEFINED
> +};
> +
>  /* Note that the (struct drm_i915_private *) cast is just to shut up gcc. */
>  #define __I915__(p) ({ \
>  	struct drm_i915_private *__p; \
> @@ -2690,8 +2732,12 @@ extern long i915_compat_ioctl(struct file *filp, unsigned int cmd,
>  			      unsigned long arg);
>  #endif
>  extern int intel_gpu_reset(struct drm_device *dev);
> +extern int intel_gpu_engine_reset(struct intel_engine_cs *engine);
> +extern int intel_request_gpu_engine_reset(struct intel_engine_cs *engine);
> +extern int intel_unrequest_gpu_engine_reset(struct intel_engine_cs *engine);
>  extern bool intel_has_gpu_reset(struct drm_device *dev);
>  extern int i915_reset(struct drm_device *dev);
> +extern int i915_reset_engine(struct intel_engine_cs *engine);
>  extern unsigned long i915_chipset_val(struct drm_i915_private *dev_priv);
>  extern unsigned long i915_mch_val(struct drm_i915_private *dev_priv);
>  extern unsigned long i915_gfx_val(struct drm_i915_private *dev_priv);
> @@ -2704,6 +2750,18 @@ void intel_hpd_init(struct drm_i915_private *dev_priv);
>  void intel_hpd_init_work(struct drm_i915_private *dev_priv);
>  void intel_hpd_cancel_work(struct drm_i915_private *dev_priv);
>  bool intel_hpd_pin_to_port(enum hpd_pin pin, enum port *port);
> +static inline void i915_hangcheck_reinit(struct intel_engine_cs *engine)
> +{
> +	struct intel_ring_hangcheck *hc = &engine->hangcheck;
> +
> +	hc->acthd = 0;
> +	hc->max_acthd = 0;
> +	hc->seqno = 0;
> +	hc->score = 0;
> +	hc->action = HANGCHECK_IDLE;
> +	hc->deadlock = 0;
> +}
> +

Rename to intel_engine_hangcheck_init and to intel_ringbuffer.c

>  
>  /* i915_irq.c */
>  void i915_queue_hangcheck(struct drm_device *dev);
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index f04d799..6a0ec37 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -2470,10 +2470,70 @@ static void i915_reset_and_wakeup(struct drm_device *dev)
>  	char *error_event[] = { I915_ERROR_UEVENT "=1", NULL };
>  	char *reset_event[] = { I915_RESET_UEVENT "=1", NULL };
>  	char *reset_done_event[] = { I915_ERROR_UEVENT "=0", NULL };
> -	int ret;
> +	bool reset_complete = false;
> +	struct intel_engine_cs *ring;
> +	int ret = 0;
> +	int i;
> +
> +	mutex_lock(&dev->struct_mutex);
>  
>  	kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE, error_event);
>  
> +	for_each_ring(ring, dev_priv, i) {
> +
> +		/*
> +		 * Skip further individual engine reset requests if full GPU
> +		 * reset requested.
> +		 */
> +		if (i915_reset_in_progress(error))
> +			break;
> +
> +		if (atomic_read(&ring->hangcheck.flags) &
> +			I915_ENGINE_RESET_IN_PROGRESS) {
> +
> +			if (!reset_complete)
> +				kobject_uevent_env(&dev->primary->kdev->kobj,
> +						   KOBJ_CHANGE,
> +						   reset_event);
> +
> +			reset_complete = true;
> +
> +			ret = i915_reset_engine(ring);
> +
> +			/*
> +			 * Execlist mode only:
> +			 *
> +			 * -EAGAIN means that between detecting a hang (and
> +			 * also determining that the currently submitted
> +			 * context is stable and valid) and trying to recover
> +			 * from the hang the current context changed state.
> +			 * This means that we are probably not completely hung
> +			 * after all. Just fail and retry by exiting all the
> +			 * way back and wait for the next hang detection. If we
> +			 * have a true hang on our hands then we will detect it
> +			 * again, otherwise we will continue like nothing
> +			 * happened.
> +			 */
> +			if (ret == -EAGAIN) {
> +				DRM_ERROR("Reset of %s aborted due to " \
> +					  "change in context submission " \
> +					  "state - retrying!", ring->name);
> +				ret = 0;
> +			}
> +
> +			if (ret) {
> +				DRM_ERROR("Reset of %s failed! (%d)", ring->name, ret);
> +
> +				atomic_or(I915_RESET_IN_PROGRESS_FLAG,
> +					&dev_priv->gpu_error.reset_counter);
> +				break;
> +			}
> +		}
> +	}
> +
> +	/* The full GPU reset will grab the struct_mutex when it needs it */
> +	mutex_unlock(&dev->struct_mutex);
> +
>  	/*
>  	 * Note that there's only one work item which does gpu resets, so we
>  	 * need not worry about concurrent gpu resets potentially incrementing
> @@ -2486,8 +2546,13 @@ static void i915_reset_and_wakeup(struct drm_device *dev)
>  	 */
>  	if (i915_reset_in_progress(error) && !i915_terminally_wedged(error)) {
>  		DRM_DEBUG_DRIVER("resetting chip\n");
> -		kobject_uevent_env(&dev->primary->kdev->kobj, KOBJ_CHANGE,
> -				   reset_event);
> +
> +		if (!reset_complete)
> +			kobject_uevent_env(&dev->primary->kdev->kobj,
> +					   KOBJ_CHANGE,
> +					   reset_event);
> +
> +		reset_complete = true;
>  
>  		/*
>  		 * In most cases it's guaranteed that we get here with an RPM
> @@ -2520,23 +2585,36 @@ static void i915_reset_and_wakeup(struct drm_device *dev)
>  			 *
>  			 * Since unlock operations are a one-sided barrier only,
>  			 * we need to insert a barrier here to order any seqno
> -			 * updates before
> -			 * the counter increment.
> +			 * updates before the counter increment.
> +			 *
> +			 * The increment clears I915_RESET_IN_PROGRESS_FLAG.
>  			 */
>  			smp_mb__before_atomic();
>  			atomic_inc(&dev_priv->gpu_error.reset_counter);
>  
> -			kobject_uevent_env(&dev->primary->kdev->kobj,
> -					   KOBJ_CHANGE, reset_done_event);
> +			/*
> +			 * If any per-engine resets were promoted to full GPU
> +			 * reset don't forget to clear those reset flags.
> +			 */
> +			for_each_ring(ring, dev_priv, i)
> +				atomic_set(&ring->hangcheck.flags, 0);
>  		} else {
> +			/* Terminal wedge condition */
> +			WARN(1, "i915_reset failed, declaring GPU as wedged!\n");
>  			atomic_or(I915_WEDGED, &error->reset_counter);
>  		}
> +	}
>  
> -		/*
> -		 * Note: The wake_up also serves as a memory barrier so that
> -		 * waiters see the update value of the reset counter atomic_t.
> -		 */
> +	/*
> +	 * Note: The wake_up also serves as a memory barrier so that
> +	 * waiters see the update value of the reset counter atomic_t.
> +	 */
> +	if (reset_complete) {
>  		i915_error_wake_up(dev_priv, true);
> +
> +		if (ret == 0)
> +			kobject_uevent_env(&dev->primary->kdev->kobj,
> +					   KOBJ_CHANGE, reset_done_event);
>  	}
>  }
>  
> @@ -2649,6 +2727,14 @@ void i915_handle_error(struct drm_device *dev, bool wedged,
>  	va_list args;
>  	char error_msg[80];
>  
> +	struct intel_engine_cs *engine;
> +
> +	/*
> +	 * NB: Placeholder until the hang checker supports
> +	 * per-engine hang detection.
> +	 */
> +	u32 engine_mask = 0;
> +
>  	va_start(args, fmt);
>  	vscnprintf(error_msg, sizeof(error_msg), fmt, args);
>  	va_end(args);
> @@ -2657,8 +2743,65 @@ void i915_handle_error(struct drm_device *dev, bool wedged,
>  	i915_report_and_clear_eir(dev);
>  
>  	if (wedged) {
> -		atomic_or(I915_RESET_IN_PROGRESS_FLAG,
> -				&dev_priv->gpu_error.reset_counter);
> +		/*
> +		 * Defer to full GPU reset if any of the following is true:
> +		 *	0. Engine reset disabled.
> +		 * 	1. The caller did not ask for per-engine reset.
> +		 *	2. The hardware does not support it (pre-gen7).
> +		 *	3. We already tried per-engine reset recently.
> +		 */
> +		bool full_reset = true;
> +
> +		if (!i915.enable_engine_reset) {
> +			DRM_INFO("Engine reset disabled: Using full GPU reset.\n");
> +			engine_mask = 0x0;
> +		}
> +
> +		/*
> +		 * TBD: We currently only support per-engine reset for gen8+.
> +		 * Implement support for gen7.
> +		 */
> +		if (engine_mask && (INTEL_INFO(dev)->gen >= 8)) {
> +			u32 i;
> +
> +			for_each_ring(engine, dev_priv, i) {
> +				u32 now, last_engine_reset_timediff;
> +
> +				if (!(intel_ring_flag(engine) & engine_mask))
> +					continue;
> +
> +				/* Measure the time since this engine was last reset */
> +				now = get_seconds();
> +				last_engine_reset_timediff =
> +					now - engine->hangcheck.last_engine_reset_time;
> +
> +				full_reset = last_engine_reset_timediff <
> +					i915.gpu_reset_promotion_time;
> +
> +				engine->hangcheck.last_engine_reset_time = now;
> +
> +				/*
> +				 * This engine was not reset too recently - go ahead
> +				 * with engine reset instead of falling back to full
> +				 * GPU reset.
> +				 *
> +				 * Flag that we want to try and reset this engine.
> +				 * This can still be overridden by a global
> +				 * reset e.g. if per-engine reset fails.
> +				 */
> +				if (!full_reset)
> +					atomic_or(I915_ENGINE_RESET_IN_PROGRESS,
> +						&engine->hangcheck.flags);
> +				else
> +					break;
> +
> +			} /* for_each_ring */
> +		}
> +
> +		if (full_reset) {
> +			atomic_or(I915_RESET_IN_PROGRESS_FLAG,
> +					&dev_priv->gpu_error.reset_counter);
> +		}
>  
>  		/*
>  		 * Wakeup waiting processes so that the reset function
> diff --git a/drivers/gpu/drm/i915/i915_params.c b/drivers/gpu/drm/i915/i915_params.c
> index 8d90c25..5cf9c11 100644
> --- a/drivers/gpu/drm/i915/i915_params.c
> +++ b/drivers/gpu/drm/i915/i915_params.c
> @@ -37,6 +37,8 @@ struct i915_params i915 __read_mostly = {
>  	.enable_fbc = -1,
>  	.enable_execlists = -1,
>  	.enable_hangcheck = true,
> +	.enable_engine_reset = false,
> +	.gpu_reset_promotion_time = 10,
>  	.enable_ppgtt = -1,
>  	.enable_psr = 0,
>  	.preliminary_hw_support = IS_ENABLED(CONFIG_DRM_I915_PRELIMINARY_HW_SUPPORT),
> @@ -116,6 +118,23 @@ MODULE_PARM_DESC(enable_hangcheck,
>  	"WARNING: Disabling this can cause system wide hangs. "
>  	"(default: true)");
>  
> +module_param_named_unsafe(enable_engine_reset, i915.enable_engine_reset, bool, 0644);
> +MODULE_PARM_DESC(enable_engine_reset,
> +	"Enable GPU engine hang recovery mode. Used as a soft, low-impact form "
> +	"of hang recovery that targets individual GPU engines rather than the "
> +	"entire GPU"
> +	"(default: false)");
> +
> +module_param_named(gpu_reset_promotion_time,
> +               i915.gpu_reset_promotion_time, int, 0644);
> +MODULE_PARM_DESC(gpu_reset_promotion_time,
> +               "Catch excessive engine resets. Each engine maintains a "
> +	       "timestamp of the last time it was reset. If it hangs again "
> +	       "within this period then fall back to full GPU reset to try and"
> +	       " recover from the hang. Only applicable if enable_engine_reset "
> +	       "is enabled."
> +               "default=10 seconds");
> +
>  module_param_named_unsafe(enable_ppgtt, i915.enable_ppgtt, int, 0400);
>  MODULE_PARM_DESC(enable_ppgtt,
>  	"Override PPGTT usage. "
> diff --git a/drivers/gpu/drm/i915/i915_params.h b/drivers/gpu/drm/i915/i915_params.h
> index 5299290..60f3d23 100644
> --- a/drivers/gpu/drm/i915/i915_params.h
> +++ b/drivers/gpu/drm/i915/i915_params.h
> @@ -49,8 +49,10 @@ struct i915_params {
>  	int use_mmio_flip;
>  	int mmio_debug;
>  	int edp_vswing;
> +	unsigned int gpu_reset_promotion_time;
>  	/* leave bools at the end to not create holes */
>  	bool enable_hangcheck;
> +	bool enable_engine_reset;
>  	bool fastboot;
>  	bool prefault_disable;
>  	bool load_detect_test;
> diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
> index 0a98889..3fc5d75 100644
> --- a/drivers/gpu/drm/i915/i915_reg.h
> +++ b/drivers/gpu/drm/i915/i915_reg.h
> @@ -164,6 +164,8 @@ static inline bool i915_mmio_reg_valid(i915_reg_t reg)
>  #define  GEN6_GRDOM_RENDER		(1 << 1)
>  #define  GEN6_GRDOM_MEDIA		(1 << 2)
>  #define  GEN6_GRDOM_BLT			(1 << 3)
> +#define  GEN6_GRDOM_VECS		(1 << 4)
> +#define  GEN8_GRDOM_MEDIA2		(1 << 7)
>  
>  #define RING_PP_DIR_BASE(ring)		_MMIO((ring)->mmio_base+0x228)
>  #define RING_PP_DIR_BASE_READ(ring)	_MMIO((ring)->mmio_base+0x518)
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index ab344e0..fcec476 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -136,6 +136,7 @@
>  #include <drm/i915_drm.h>
>  #include "i915_drv.h"
>  #include "intel_mocs.h"
> +#include "intel_lrc_tdr.h"
>  
>  #define GEN9_LR_CONTEXT_RENDER_SIZE (22 * PAGE_SIZE)
>  #define GEN8_LR_CONTEXT_RENDER_SIZE (20 * PAGE_SIZE)
> @@ -325,7 +326,8 @@ uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
>  }
>  
>  static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
> -				 struct drm_i915_gem_request *rq1)
> +				 struct drm_i915_gem_request *rq1,
> +				 bool tdr_resubmission)
>  {
>  
>  	struct intel_engine_cs *ring = rq0->ring;
> @@ -335,13 +337,17 @@ static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
>  
>  	if (rq1) {
>  		desc[1] = intel_lr_context_descriptor(rq1->ctx, rq1->ring);
> -		rq1->elsp_submitted++;
> +
> +		if (!tdr_resubmission)
> +			rq1->elsp_submitted++;
>  	} else {
>  		desc[1] = 0;
>  	}
>  
>  	desc[0] = intel_lr_context_descriptor(rq0->ctx, rq0->ring);
> -	rq0->elsp_submitted++;
> +
> +	if (!tdr_resubmission)
> +		rq0->elsp_submitted++;
>  
>  	/* You must always write both descriptors in the order below. */
>  	spin_lock(&dev_priv->uncore.lock);
> @@ -359,6 +365,182 @@ static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
>  	spin_unlock(&dev_priv->uncore.lock);
>  }
>  
> +/**
> + * execlist_get_context_reg_page() - Get memory page for context object
> + * @engine: engine
> + * @ctx: context running on engine
> + * @page: returned page
> + *
> + * Return: 0 if successful, otherwise propagates error codes.
> + */
> +static inline int execlist_get_context_reg_page(struct intel_engine_cs *engine,
> +		struct intel_context *ctx,
> +		struct page **page)
> +{

All the macros and reg_page stuff can be removed as 
there is ctx->engine[id].lrc_reg_state for pinned
ctx objects.

> +	struct drm_i915_gem_object *ctx_obj;
> +
> +	if (!page)
> +		return -EINVAL;
> +
> +	if (!ctx)
> +		ctx = engine->default_context;
> +

No. Add a warn which triggers if someone tries to 
touch the default_context through this mechanism.

Default should be sacred, we don't want any state to
accidentally creep into it.

> +	ctx_obj = ctx->engine[engine->id].state;
> +
> +	if (WARN(!ctx_obj, "Context object not set up!\n"))
> +		return -EINVAL;
> +
> +	WARN(!i915_gem_obj_is_pinned(ctx_obj),
> +	     "Context object is not pinned!\n");
> +
> +	*page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);

> +
> +	if (WARN(!*page, "Context object page could not be resolved!\n"))
> +		return -EINVAL;
> +
> +	return 0;
> +}
> +
> +/**
> + * execlist_write_context_reg() - Write value to Context register
> + * @engine: Engine
> + * @ctx: Context running on engine
> + * @ctx_reg: Index into context image pointing to register location
> + * @mmio_reg_addr: MMIO register address
> + * @val: Value to be written
> + * @mmio_reg_name_str: Designated register name
> + *
> + * Return: 0 if successful, otherwise propagates error codes.
> + */
> +static inline int execlists_write_context_reg(struct intel_engine_cs *engine,
> +					      struct intel_context *ctx,
> +					      u32 ctx_reg,
> +					      i915_reg_t mmio_reg,
> +					      u32 val,
> +					      const char *mmio_reg_name_str)
> +{

> +	struct page *page = NULL;
> +	uint32_t *reg_state;
> +
> +	int ret = execlist_get_context_reg_page(engine, ctx, &page);
> +	if (WARN(ret, "[write %s:%u] Failed to get context memory page for %s!\n",
> +		 mmio_reg_name_str, (unsigned int) mmio_reg.reg, engine->name)) {
> +		return ret;
> +	}
> +
> +	reg_state = kmap_atomic(page);
> +
> +	WARN(reg_state[ctx_reg] != mmio_reg.reg,
> +	     "[write %s:%u]: Context reg addr (%x) != MMIO reg addr (%x)!\n",
> +	     mmio_reg_name_str,
> +	     (unsigned int) mmio_reg.reg,
> +	     (unsigned int) reg_state[ctx_reg],
> +	     (unsigned int) mmio_reg.reg);
> +
> +	reg_state[ctx_reg+1] = val;
> +	kunmap_atomic(reg_state);
> +
> +	return ret;
> +}
> +
> +/**
> + * execlist_read_context_reg() - Read value from Context register
> + * @engine: Engine
> + * @ctx: Context running on engine
> + * @ctx_reg: Index into context image pointing to register location
> + * @mmio_reg: MMIO register struct
> + * @val: Output parameter returning register value
> + * @mmio_reg_name_str: Designated register name
> + *
> + * Return: 0 if successful, otherwise propagates error codes.
> + */
> +static inline int execlists_read_context_reg(struct intel_engine_cs *engine,
> +					     struct intel_context *ctx,
> +					     u32 ctx_reg,
> +					     i915_reg_t mmio_reg,
> +					     u32 *val,
> +					     const char *mmio_reg_name_str)
> +{

> +	struct page *page = NULL;
> +	uint32_t *reg_state;
> +	int ret = 0;
> +
> +	if (!val)
> +		return -EINVAL;
> +
> +	ret = execlist_get_context_reg_page(engine, ctx, &page);
> +	if (WARN(ret, "[read %s:%u] Failed to get context memory page for %s!\n",
> +		 mmio_reg_name_str, (unsigned int) mmio_reg.reg, engine->name)) {
> +		return ret;
> +	}
> +
> +	reg_state = kmap_atomic(page);
> +
> +	WARN(reg_state[ctx_reg] != mmio_reg.reg,
> +	     "[read %s:%u]: Context reg addr (%x) != MMIO reg addr (%x)!\n",
> +	     mmio_reg_name_str,
> +	     (unsigned int) ctx_reg,
> +	     (unsigned int) reg_state[ctx_reg],
> +	     (unsigned int) mmio_reg.reg);
> +
> +	*val = reg_state[ctx_reg+1];
> +	kunmap_atomic(reg_state);
> +
> +	return ret;
> + }
> +
> +/*
> + * Generic macros for generating function implementation for context register
> + * read/write functions.
> + *
> + * Macro parameters
> + * ----------------
> + * reg_name: Designated name of context register (e.g. tail, head, buffer_ctl)
> + *
> + * reg_def: Context register macro definition (e.g. CTX_RING_TAIL)
> + *
> + * mmio_reg_def: Name of macro function used to determine the address
> + *		 of the corresponding MMIO register (e.g. RING_TAIL, RING_HEAD).
> + *		 This macro function is assumed to be defined on the form of:
> + *
> + *			#define mmio_reg_def(base) (base+register_offset)
> + *
> + *		 Where "base" is the MMIO base address of the respective ring
> + *		 and "register_offset" is the offset relative to "base".
> + *
> + * Function parameters
> + * -------------------
> + * engine: The engine that the context is running on
> + * ctx: The context of the register that is to be accessed
> + * reg_name: Value to be written/read to/from the register.
> + */
> +#define INTEL_EXECLISTS_WRITE_REG(reg_name, reg_def, mmio_reg_def) \
> +	int intel_execlists_write_##reg_name(struct intel_engine_cs *engine, \
> +					     struct intel_context *ctx, \
> +					     u32 reg_name) \
> +{ \
> +	return execlists_write_context_reg(engine, ctx, (reg_def), \
> +			mmio_reg_def(engine->mmio_base), (reg_name), \
> +			(#reg_name)); \
> +}
> +
> +#define INTEL_EXECLISTS_READ_REG(reg_name, reg_def, mmio_reg_def) \
> +	int intel_execlists_read_##reg_name(struct intel_engine_cs *engine, \
> +					    struct intel_context *ctx, \
> +					    u32 *reg_name) \
> +{ \
> +	return execlists_read_context_reg(engine, ctx, (reg_def), \
> +			mmio_reg_def(engine->mmio_base), (reg_name), \
> +			(#reg_name)); \
> +}
> +
> +INTEL_EXECLISTS_READ_REG(tail, CTX_RING_TAIL, RING_TAIL)
> +INTEL_EXECLISTS_WRITE_REG(head, CTX_RING_HEAD, RING_HEAD)
> +INTEL_EXECLISTS_READ_REG(head, CTX_RING_HEAD, RING_HEAD)
> +
> +#undef INTEL_EXECLISTS_READ_REG
> +#undef INTEL_EXECLISTS_WRITE_REG
> +
>  static int execlists_update_context(struct drm_i915_gem_request *rq)
>  {
>  	struct intel_engine_cs *ring = rq->ring;
> @@ -396,17 +578,18 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
>  }
>  
>  static void execlists_submit_requests(struct drm_i915_gem_request *rq0,
> -				      struct drm_i915_gem_request *rq1)
> +				      struct drm_i915_gem_request *rq1,
> +				      bool tdr_resubmission)
>  {
>  	execlists_update_context(rq0);
>  
>  	if (rq1)
>  		execlists_update_context(rq1);
>  
> -	execlists_elsp_write(rq0, rq1);
> +	execlists_elsp_write(rq0, rq1, tdr_resubmission);
>  }
>  
> -static void execlists_context_unqueue(struct intel_engine_cs *ring)
> +static void execlists_context_unqueue(struct intel_engine_cs *ring, bool tdr_resubmission)
>  {
>  	struct drm_i915_gem_request *req0 = NULL, *req1 = NULL;
>  	struct drm_i915_gem_request *cursor = NULL, *tmp = NULL;
> @@ -440,6 +623,16 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
>  		}
>  	}
>  
> +	/*
> +	 * Only do TDR resubmission of the second head request if it's already
> +	 * been submitted. The intention is to restore the original submission
> +	 * state from the situation when the hang originally happened. If it
> +	 * was never submitted we don't want to submit it for the first time at
> +	 * this point
> +	 */
> +	if (tdr_resubmission && req1 && !req1->elsp_submitted)
> +		req1 = NULL;
> +
>  	if (IS_GEN8(ring->dev) || IS_GEN9(ring->dev)) {
>  		/*
>  		 * WaIdleLiteRestore: make sure we never cause a lite
> @@ -460,9 +653,32 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
>  		}
>  	}
>  
> -	WARN_ON(req1 && req1->elsp_submitted);
> +	WARN_ON(req1 && req1->elsp_submitted && !tdr_resubmission);
>  
> -	execlists_submit_requests(req0, req1);
> +	execlists_submit_requests(req0, req1, tdr_resubmission);
> +}
> +
> +/**
> + * intel_execlists_TDR_context_resubmission() - ELSP context resubmission
> + * @ring: engine to do resubmission for.
> + *
> + * Context submission mechanism exclusively used by TDR that bypasses the
> + * execlist queue. This is necessary since at the point of TDR hang recovery
> + * the hardware will be hung and resubmitting a fixed context (the context that
> + * the TDR has identified as hung and fixed up in order to move past the
> + * blocking batch buffer) to a hung execlist queue will lock up the TDR.
> + * Instead, opt for direct ELSP submission without depending on the rest of the
> + * driver.
> + */
> +void intel_execlists_TDR_context_resubmission(struct intel_engine_cs *ring)
> +{
> +	unsigned long flags;
> +
> +	spin_lock_irqsave(&ring->execlist_lock, flags);
> +	WARN_ON(list_empty(&ring->execlist_queue));
> +
> +	execlists_context_unqueue(ring, true);
> +	spin_unlock_irqrestore(&ring->execlist_lock, flags);
>  }
>  
>  static bool execlists_check_remove_request(struct intel_engine_cs *ring,
> @@ -560,9 +776,9 @@ void intel_lrc_irq_handler(struct intel_engine_cs *ring)
>  		/* Prevent a ctx to preempt itself */
>  		if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) &&
>  		    (submit_contexts != 0))
> -			execlists_context_unqueue(ring);
> +			execlists_context_unqueue(ring, false);
>  	} else if (submit_contexts != 0) {
> -		execlists_context_unqueue(ring);
> +		execlists_context_unqueue(ring, false);
>  	}
>  
>  	spin_unlock(&ring->execlist_lock);
> @@ -613,7 +829,7 @@ static int execlists_context_queue(struct drm_i915_gem_request *request)
>  
>  	list_add_tail(&request->execlist_link, &ring->execlist_queue);
>  	if (num_elements == 0)
> -		execlists_context_unqueue(ring);
> +		execlists_context_unqueue(ring, false);
>  
>  	spin_unlock_irq(&ring->execlist_lock);
>  
> @@ -1536,7 +1752,7 @@ static int gen8_init_common_ring(struct intel_engine_cs *ring)
>  	ring->next_context_status_buffer = next_context_status_buffer_hw;
>  	DRM_DEBUG_DRIVER("Execlists enabled for %s\n", ring->name);
>  
> -	memset(&ring->hangcheck, 0, sizeof(ring->hangcheck));
> +	i915_hangcheck_reinit(ring);
>  
>  	return 0;
>  }
> @@ -1888,6 +2104,187 @@ out:
>  	return ret;
>  }
>  
> +static int
> +gen8_ring_disable(struct intel_engine_cs *ring)
> +{
> +	intel_request_gpu_engine_reset(ring);
> +	return 0;
> +}
> +
> +static int
> +gen8_ring_enable(struct intel_engine_cs *ring)
> +{
> +	intel_unrequest_gpu_engine_reset(ring);
> +	return 0;
> +}
> +
> +/**
> + * gen8_ring_save() - save minimum engine state
> + * @ring: engine whose state is to be saved
> + * @req: request containing the context currently running on engine
> + * @force_advance: indicates whether or not we should nudge the head
> + *		  forward or not
> + *
> + * Saves the head MMIO register to scratch memory while engine is reset and
> + * reinitialized. Before saving the head register we nudge the head position to
> + * be correctly aligned with a QWORD boundary, which brings it up to the next
> + * presumably valid instruction. Typically, at the point of hang recovery the
> + * head register will be pointing to the last DWORD of the BB_START
> + * instruction, which is followed by a padding MI_NOOP inserted by the
> + * driver.
> + *
> + * Returns:
> + * 	0 if ok, otherwise propagates error codes.
> + */
> +static int
> +gen8_ring_save(struct intel_engine_cs *ring, struct drm_i915_gem_request *req,
> +		bool force_advance)
> +{
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	struct intel_ringbuffer *ringbuf = NULL;
> +	struct intel_context *ctx;
> +	int ret = 0;
> +	int clamp_to_tail = 0;
> +	uint32_t head;
> +	uint32_t tail;
> +	uint32_t head_addr;
> +	uint32_t tail_addr;
> +
> +	if (WARN_ON(!req))
> +	    return -EINVAL;
> +
> +	ctx = req->ctx;
> +	ringbuf = ctx->engine[ring->id].ringbuf;
> +
> +	/*
> +	 * Read head from MMIO register since it contains the
> +	 * most up to date value of head at this point.
> +	 */
> +	head = I915_READ_HEAD(ring);
> +
> +	/*
> +	 * Read tail from the context because the execlist queue
> +	 * updates the tail value there first during submission.
> +	 * The MMIO tail register is not updated until the actual
> +	 * ring submission completes.
> +	 */
> +	ret = I915_READ_TAIL_CTX(ring, ctx, tail);
> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * head_addr and tail_addr are the head and tail values
> +	 * excluding ring wrapping information and aligned to DWORD
> +	 * boundary
> +	 */
> +	head_addr = head & HEAD_ADDR;
> +	tail_addr = tail & TAIL_ADDR;
> +
> +	/*
> +	 * The head must always chase the tail.
> +	 * If the tail is beyond the head then do not allow
> +	 * the head to overtake it. If the tail is less than
> +	 * the head then the tail has already wrapped and
> +	 * there is no problem in advancing the head or even
> +	 * wrapping the head back to 0 as worst case it will
> +	 * become equal to tail
> +	 */
> +	if (head_addr <= tail_addr)
> +		clamp_to_tail = 1;
> +
> +	if (force_advance) {
> +
> +		/* Force head pointer to next QWORD boundary */
> +		head_addr &= ~0x7;
> +		head_addr += 8;
> +
> +	} else if (head & 0x7) {
> +
> +		/* Ensure head pointer is pointing to a QWORD boundary */
> +		head += 0x7;
> +		head &= ~0x7;
> +		head_addr = head;
> +	}
> +
> +	if (clamp_to_tail && (head_addr > tail_addr)) {
> +		head_addr = tail_addr;
> +	} else if (head_addr >= ringbuf->size) {
> +		/* Wrap head back to start if it exceeds ring size */
> +		head_addr = 0;
> +	}
> +
> +	head &= ~HEAD_ADDR;
> +	head |= (head_addr & HEAD_ADDR);
> +	ring->saved_head = head;
> +
> +	return 0;
> +}
> +
> +
> +/**
> + * gen8_ring_restore() - restore previously saved engine state
> + * @ring: engine whose state is to be restored
> + * @req: request containing the context currently running on engine
> + *
> + * Reinitializes engine and restores the previously saved engine state.
> + * See: gen8_ring_save()
> + *
> + * Returns:
> + * 	0 if ok, otherwise propagates error codes.
> + */
> +static int
> +gen8_ring_restore(struct intel_engine_cs *ring, struct drm_i915_gem_request *req)
> +{
> +	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> +	struct intel_context *ctx;
> +
> +	if (WARN_ON(!req))
> +	    return -EINVAL;
> +
> +	ctx = req->ctx;
> +
> +	/* Re-initialize ring */
> +	if (ring->init_hw) {
> +		int ret = ring->init_hw(ring);
> +		if (ret != 0) {
> +			DRM_ERROR("Failed to re-initialize %s\n",
> +					ring->name);
> +			return ret;
> +		}
> +	} else {
> +		DRM_ERROR("ring init function pointer not set up\n");
> +		return -EINVAL;
> +	}
> +
> +	if (ring->id == RCS) {
> +		/*
> +		 * These register reinitializations are only located here
> +		 * temporarily until they are moved out of the
> +		 * init_clock_gating function to some function we can
> +		 * call from here.
> +		 */
> +
> +		/* WaVSRefCountFullforceMissDisable:chv */
> +		/* WaDSRefCountFullforceMissDisable:chv */
> +		I915_WRITE(GEN7_FF_THREAD_MODE,
> +			   I915_READ(GEN7_FF_THREAD_MODE) &
> +			   ~(GEN8_FF_DS_REF_CNT_FFME | GEN7_FF_VS_REF_CNT_FFME));
> +
> +		I915_WRITE(_3D_CHICKEN3,
> +			   _3D_CHICKEN_SDE_LIMIT_FIFO_POLY_DEPTH(2));
> +
> +		/* WaSwitchSolVfFArbitrationPriority:bdw */
> +		I915_WRITE(GAM_ECOCHK, I915_READ(GAM_ECOCHK) | HSW_ECOCHK_ARB_PRIO_SOL);
> +	}
> +
> +	/* Restore head */
> +
> +	I915_WRITE_HEAD(ring, ring->saved_head);
> +	I915_WRITE_HEAD_CTX(ring, ctx, ring->saved_head);
> +
> +	return 0;
> +}
> +
>  static int gen8_init_rcs_context(struct drm_i915_gem_request *req)
>  {
>  	int ret;
> @@ -2021,6 +2418,10 @@ static int logical_render_ring_init(struct drm_device *dev)
>  	ring->irq_get = gen8_logical_ring_get_irq;
>  	ring->irq_put = gen8_logical_ring_put_irq;
>  	ring->emit_bb_start = gen8_emit_bb_start;
> +	ring->enable = gen8_ring_enable;
> +	ring->disable = gen8_ring_disable;
> +	ring->save = gen8_ring_save;
> +	ring->restore = gen8_ring_restore;
>  
>  	ring->dev = dev;
>  
> @@ -2073,6 +2474,10 @@ static int logical_bsd_ring_init(struct drm_device *dev)
>  	ring->irq_get = gen8_logical_ring_get_irq;
>  	ring->irq_put = gen8_logical_ring_put_irq;
>  	ring->emit_bb_start = gen8_emit_bb_start;
> +	ring->enable = gen8_ring_enable;
> +	ring->disable = gen8_ring_disable;
> +	ring->save = gen8_ring_save;
> +	ring->restore = gen8_ring_restore;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -2098,6 +2503,10 @@ static int logical_bsd2_ring_init(struct drm_device *dev)
>  	ring->irq_get = gen8_logical_ring_get_irq;
>  	ring->irq_put = gen8_logical_ring_put_irq;
>  	ring->emit_bb_start = gen8_emit_bb_start;
> +	ring->enable = gen8_ring_enable;
> +	ring->disable = gen8_ring_disable;
> +	ring->save = gen8_ring_save;
> +	ring->restore = gen8_ring_restore;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -2128,6 +2537,10 @@ static int logical_blt_ring_init(struct drm_device *dev)
>  	ring->irq_get = gen8_logical_ring_get_irq;
>  	ring->irq_put = gen8_logical_ring_put_irq;
>  	ring->emit_bb_start = gen8_emit_bb_start;
> +	ring->enable = gen8_ring_enable;
> +	ring->disable = gen8_ring_disable;
> +	ring->save = gen8_ring_save;
> +	ring->restore = gen8_ring_restore;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -2158,6 +2571,10 @@ static int logical_vebox_ring_init(struct drm_device *dev)
>  	ring->irq_get = gen8_logical_ring_get_irq;
>  	ring->irq_put = gen8_logical_ring_put_irq;
>  	ring->emit_bb_start = gen8_emit_bb_start;
> +	ring->enable = gen8_ring_enable;
> +	ring->disable = gen8_ring_disable;
> +	ring->save = gen8_ring_save;
> +	ring->restore = gen8_ring_restore;
>  
>  	return logical_ring_init(dev, ring);
>  }
> @@ -2587,3 +3004,127 @@ void intel_lr_context_reset(struct drm_device *dev,
>  		ringbuf->tail = 0;
>  	}
>  }
> +
> +/**
> + * intel_execlists_TDR_get_current_request() - return request currently
> + * processed by engine
> + *
> + * @ring: Engine currently running context to be returned.
> + *
> + * @req:  Output parameter containing the current request (the request at the
> + *	  head of execlist queue corresponding to the given ring). May be NULL
> + *	  if no request has been submitted to the execlist queue of this
> + *	  engine. If the req parameter passed in to the function is not NULL
> + *	  and a request is found and returned the request is referenced before
> + *	  it is returned. It is the responsibility of the caller to dereference
> + *	  it at the end of its life cycle.
> + *
> + * Return:
> + *	CONTEXT_SUBMISSION_STATUS_OK if request is found to be submitted and its
> + *	context is currently running on engine.
> + *
> + *	CONTEXT_SUBMISSION_STATUS_INCONSISTENT if request is found to be submitted
> + *	but its context is not in a state that is consistent with current
> + *	hardware state for the given engine. This has been observed in three cases:
> + *
> + *		1. Before the engine has switched to this context after it has
> + *		been submitted to the execlist queue.
> + *
> + *		2. After the engine has switched away from this context but
> + *		before the context has been removed from the execlist queue.
> + *
> + *		3. The driver has lost an interrupt. Typically the hardware has
> + *		gone to idle but the driver still thinks the context belonging to
> + *		the request at the head of the queue is still executing.
> + *
> + *	CONTEXT_SUBMISSION_STATUS_NONE_SUBMITTED if no context has been found
> + *	to be submitted to the execlist queue and if the hardware is idle.
> + */
> +enum context_submission_status
> +intel_execlists_TDR_get_current_request(struct intel_engine_cs *ring,
> +		struct drm_i915_gem_request **req)
> +{
> +	struct drm_i915_private *dev_priv;
> +	unsigned long flags;
> +	struct drm_i915_gem_request *tmpreq = NULL;
> +	struct intel_context *tmpctx = NULL;
> +	unsigned hw_context = 0;
> +	unsigned sw_context = 0;
> +	bool hw_active = false;
> +	enum context_submission_status status =
> +			CONTEXT_SUBMISSION_STATUS_UNDEFINED;
> +
> +	if (WARN_ON(!ring))
> +		return status;
> +
> +	dev_priv = ring->dev->dev_private;
> +
> +	intel_uncore_forcewake_get(dev_priv, FORCEWAKE_ALL);
> +	spin_lock_irqsave(&ring->execlist_lock, flags);
> +	hw_context = I915_READ(RING_EXECLIST_STATUS_CTX_ID(ring));
> +
> +	hw_active = (I915_READ(RING_EXECLIST_STATUS_LO(ring)) &
> +		EXECLIST_STATUS_CURRENT_ACTIVE_ELEMENT_STATUS) ? true : false;
> +
> +	tmpreq = list_first_entry_or_null(&ring->execlist_queue,
> +		struct drm_i915_gem_request, execlist_link);
> +
> +	if (tmpreq) {
> +		sw_context = intel_execlists_ctx_id((tmpreq->ctx)->engine[ring->id].state);
> +
> +		/*
> +		 * Only acknowledge the request in the execlist queue if it's
> +		 * actually been submitted to hardware, otherwise there's the
> +		 * risk of a false inconsistency detection between the
> +		 * (unsubmitted) request and the idle hardware state.
> +		 */
> +		if (tmpreq->elsp_submitted > 0) {
> +			/*
> +			 * If the caller has not passed a non-NULL req
> +			 * parameter then it is not interested in getting a
> +			 * request reference back.  Don't temporarily grab a
> +			 * reference since holding the execlist lock is enough
> +			 * to ensure that the execlist code will hold its
> +			 * reference all throughout this function. As long as
> +			 * that reference is kept there is no need for us to
> +			 * take yet another reference.  The reason why this is
> +			 * of interest is because certain callers, such as the
> +			 * TDR hang checker, cannot grab struct_mutex before
> +			 * calling and because of that we cannot dereference
> +			 * any requests (DRM might assert if we do). Just rely
> +			 * on the execlist code to provide indirect protection.
> +			 */
> +			if (req)
> +				i915_gem_request_reference(tmpreq);
> +
> +			if (tmpreq->ctx)
> +				tmpctx = tmpreq->ctx;
> +		}
> +	}
> +
> +	if (tmpctx) {
> +		status = ((hw_context == sw_context) && hw_active) ?
> +				CONTEXT_SUBMISSION_STATUS_OK :
> +				CONTEXT_SUBMISSION_STATUS_INCONSISTENT;
> +	} else {
> +		/*
> +		 * If we don't have any queue entries and the
> +		 * EXECLIST_STATUS register points to zero we are
> +		 * clearly not processing any context right now
> +		 */
> +		WARN((hw_context || hw_active), "hw_context=%x, hardware %s!\n",
> +			hw_context, hw_active ? "not idle":"idle");
> +
> +		status = (hw_context || hw_active) ?
> +			CONTEXT_SUBMISSION_STATUS_INCONSISTENT :
> +			CONTEXT_SUBMISSION_STATUS_NONE_SUBMITTED;
> +	}
> +
> +	if (req)
> +		*req = tmpreq;
> +
> +	spin_unlock_irqrestore(&ring->execlist_lock, flags);
> +	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> +
> +	return status;
> +}
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index de41ad6..d9acb31 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -29,7 +29,9 @@
>  /* Execlists regs */
>  #define RING_ELSP(ring)				_MMIO((ring)->mmio_base + 0x230)
>  #define RING_EXECLIST_STATUS_LO(ring)		_MMIO((ring)->mmio_base + 0x234)
> +#define	  EXECLIST_STATUS_CURRENT_ACTIVE_ELEMENT_STATUS	(0x3 << 14)
>  #define RING_EXECLIST_STATUS_HI(ring)		_MMIO((ring)->mmio_base + 0x234 + 4)
> +#define RING_EXECLIST_STATUS_CTX_ID(ring)	RING_EXECLIST_STATUS_HI(ring)
>  #define RING_CONTEXT_CONTROL(ring)		_MMIO((ring)->mmio_base + 0x244)
>  #define	  CTX_CTRL_INHIBIT_SYN_CTX_SWITCH	(1 << 3)
>  #define	  CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT	(1 << 0)
> @@ -118,4 +120,16 @@ u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
>  void intel_lrc_irq_handler(struct intel_engine_cs *ring);
>  void intel_execlists_retire_requests(struct intel_engine_cs *ring);
>  
> +int intel_execlists_read_tail(struct intel_engine_cs *ring,
> +			 struct intel_context *ctx,
> +			 u32 *tail);
> +
> +int intel_execlists_write_head(struct intel_engine_cs *ring,
> +			  struct intel_context *ctx,
> +			  u32 head);
> +
> +int intel_execlists_read_head(struct intel_engine_cs *ring,
> +			 struct intel_context *ctx,
> +			 u32 *head);
> +

>  #endif /* _INTEL_LRC_H_ */
> diff --git a/drivers/gpu/drm/i915/intel_lrc_tdr.h b/drivers/gpu/drm/i915/intel_lrc_tdr.h
> new file mode 100644
> index 0000000..4520753
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/intel_lrc_tdr.h
> @@ -0,0 +1,36 @@
> +/*
> + * Copyright © 2015 Intel Corporation
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a
> + * copy of this software and associated documentation files (the "Software"),
> + * to deal in the Software without restriction, including without limitation
> + * the rights to use, copy, modify, merge, publish, distribute, sublicense,
> + * and/or sell copies of the Software, and to permit persons to whom the
> + * Software is furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice (including the next
> + * paragraph) shall be included in all copies or substantial portions of the
> + * Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
> + * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
> + * DEALINGS IN THE SOFTWARE.
> + */
> +
> +#ifndef _INTEL_LRC_TDR_H_
> +#define _INTEL_LRC_TDR_H_
> +
> +/* Privileged execlist API used exclusively by TDR */
> +
> +void intel_execlists_TDR_context_resubmission(struct intel_engine_cs *ring);
> +
> +enum context_submission_status
> +intel_execlists_TDR_get_current_request(struct intel_engine_cs *ring,
> +		struct drm_i915_gem_request **req);
> +
> +#endif /* _INTEL_LRC_TDR_H_ */
> +
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 4060acf..def0dcf 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -434,6 +434,88 @@ static void ring_write_tail(struct intel_engine_cs *ring,
>  	I915_WRITE_TAIL(ring, value);
>  }
>  
> +int intel_ring_disable(struct intel_engine_cs *ring)
> +{
> +	WARN_ON(!ring);
> +
> +	if (ring && ring->disable)
> +		return ring->disable(ring);
> +	else {
> +		DRM_ERROR("Ring disable not supported on %s\n", ring->name);
> +		return -EINVAL;
> +	}
> +}
> +
> +int intel_ring_enable(struct intel_engine_cs *ring)
> +{
> +	WARN_ON(!ring);
> +
> +	if (ring && ring->enable)
> +		return ring->enable(ring);
> +	else {
> +		DRM_ERROR("Ring enable not supported on %s\n", ring->name);
> +		return -EINVAL;
> +	}
> +}
> +
> +int intel_ring_save(struct intel_engine_cs *ring,
> +		struct drm_i915_gem_request *req,
> +		bool force_advance)
> +{
> +	WARN_ON(!ring);
> +
> +	if (ring && ring->save)
> +		return ring->save(ring, req, force_advance);
> +	else {
> +		DRM_ERROR("Ring save not supported on %s\n", ring->name);
> +		return -EINVAL;
> +	}
> +}
> +
> +int intel_ring_restore(struct intel_engine_cs *ring,
> +		struct drm_i915_gem_request *req)
> +{
> +	WARN_ON(!ring);
> +
> +	if (ring && ring->restore)
> +		return ring->restore(ring, req);
> +	else {
> +		DRM_ERROR("Ring restore not supported on %s\n", ring->name);
> +		return -EINVAL;
> +	}
> +}
> +
> +void intel_gpu_engine_reset_resample(struct intel_engine_cs *ring,
> +		struct drm_i915_gem_request *req)
> +{
> +	struct intel_ringbuffer *ringbuf;
> +	struct drm_i915_private *dev_priv;
> +
> +	if (WARN_ON(!ring))
> +		return;
> +
> +	dev_priv = ring->dev->dev_private;
> +
> +	if (i915.enable_execlists) {
> +		struct intel_context *ctx;
> +
> +		if (WARN_ON(!req))
> +			return;
> +
> +		ctx = req->ctx;
> +		ringbuf = ctx->engine[ring->id].ringbuf;
> +
> +		/*
> +		 * In gen8+ context head is restored during reset and
> +		 * we can use it as a reference to set up the new
> +		 * driver state.
> +		 */
> +		I915_READ_HEAD_CTX(ring, ctx, ringbuf->head);
> +		ringbuf->last_retired_head = -1;
> +		intel_ring_update_space(ringbuf);
> +	}
> +}
> +
>  u64 intel_ring_get_active_head(struct intel_engine_cs *ring)
>  {
>  	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> @@ -629,7 +711,7 @@ static int init_ring_common(struct intel_engine_cs *ring)
>  	ringbuf->tail = I915_READ_TAIL(ring) & TAIL_ADDR;
>  	intel_ring_update_space(ringbuf);
>  
> -	memset(&ring->hangcheck, 0, sizeof(ring->hangcheck));
> +	i915_hangcheck_reinit(ring);
>  
>  out:
>  	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 7349d92..7014778 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -49,6 +49,22 @@ struct  intel_hw_status_page {
>  #define I915_READ_MODE(ring) I915_READ(RING_MI_MODE((ring)->mmio_base))
>  #define I915_WRITE_MODE(ring, val) I915_WRITE(RING_MI_MODE((ring)->mmio_base), val)
>  
> +
> +#define I915_READ_TAIL_CTX(engine, ctx, outval) \
> +	intel_execlists_read_tail((engine), \
> +				(ctx), \
> +				&(outval));
> +
> +#define I915_READ_HEAD_CTX(engine, ctx, outval) \
> +	intel_execlists_read_head((engine), \
> +				(ctx), \
> +				&(outval));
> +
> +#define I915_WRITE_HEAD_CTX(engine, ctx, val) \
> +	intel_execlists_write_head((engine), \
> +				(ctx), \
> +				(val));
> +

Don't see the benefit of all the macros.

If you look at lrc_reg_state we can throw
most if not all this register reading/writing code out.

>  /* seqno size is actually only a uint32, but since we plan to use MI_FLUSH_DW to
>   * do the writes, and that must have qw aligned offsets, simply pretend it's 8b.
>   */
> @@ -94,6 +110,34 @@ struct intel_ring_hangcheck {
>  	enum intel_ring_hangcheck_action action;
>  	int deadlock;
>  	u32 instdone[I915_NUM_INSTDONE_REG];
> +
> +	/*
> +	 * Last recorded ring head index.
> +	 * This is only ever a ring index where as active
> +	 * head may be a graphics address in a ring buffer
> +	 */
> +	u32 last_head;
> +
> +	/* Flag to indicate if engine reset required */
> +	atomic_t flags;
> +
> +	/* Indicates request to reset this engine */
> +#define I915_ENGINE_RESET_IN_PROGRESS (1<<0)
> +
> +	/*
> +	 * Timestamp (seconds) from when the last time
> +	 * this engine was reset.
> +	 */
> +	u32 last_engine_reset_time;
> +
> +	/*
> +	 * Number of times this engine has been
> +	 * reset since boot
> +	 */
> +	u32 reset_count;
> +
> +	/* Number of TDR hang detections */
> +	u32 tdr_count;
>  };
>  
>  struct intel_ringbuffer {
> @@ -205,6 +249,14 @@ struct  intel_engine_cs {
>  #define I915_DISPATCH_RS     0x4
>  	void		(*cleanup)(struct intel_engine_cs *ring);
>  
> +	int (*enable)(struct intel_engine_cs *ring);
> +	int (*disable)(struct intel_engine_cs *ring);
> +	int (*save)(struct intel_engine_cs *ring,
> +		    struct drm_i915_gem_request *req,
> +		    bool force_advance);
> +	int (*restore)(struct intel_engine_cs *ring,
> +		       struct drm_i915_gem_request *req);
> +
>  	/* GEN8 signal/wait table - never trust comments!
>  	 *	  signal to	signal to    signal to   signal to      signal to
>  	 *	    RCS		   VCS          BCS        VECS		 VCS2
> @@ -311,6 +363,9 @@ struct  intel_engine_cs {
>  
>  	struct intel_ring_hangcheck hangcheck;
>  
> +	/* Saved head value to be restored after reset */
> +	u32 saved_head;
> +
>  	struct {
>  		struct drm_i915_gem_object *obj;
>  		u32 gtt_offset;
> @@ -463,6 +518,15 @@ void intel_ring_update_space(struct intel_ringbuffer *ringbuf);
>  int intel_ring_space(struct intel_ringbuffer *ringbuf);
>  bool intel_ring_stopped(struct intel_engine_cs *ring);
>  
> +void intel_gpu_engine_reset_resample(struct intel_engine_cs *ring,
> +		struct drm_i915_gem_request *req);
> +int intel_ring_disable(struct intel_engine_cs *ring);
> +int intel_ring_enable(struct intel_engine_cs *ring);
> +int intel_ring_save(struct intel_engine_cs *ring,
> +		struct drm_i915_gem_request *req, bool force_advance);
> +int intel_ring_restore(struct intel_engine_cs *ring,
> +		struct drm_i915_gem_request *req);
> +
>  int __must_check intel_ring_idle(struct intel_engine_cs *ring);
>  void intel_ring_init_seqno(struct intel_engine_cs *ring, u32 seqno);
>  int intel_ring_flush_all_caches(struct drm_i915_gem_request *req);
> diff --git a/drivers/gpu/drm/i915/intel_uncore.c b/drivers/gpu/drm/i915/intel_uncore.c
> index 2df4246..f20548c 100644
> --- a/drivers/gpu/drm/i915/intel_uncore.c
> +++ b/drivers/gpu/drm/i915/intel_uncore.c
> @@ -1623,6 +1623,153 @@ bool intel_has_gpu_reset(struct drm_device *dev)
>  	return intel_get_gpu_reset(dev) != NULL;
>  }
>  
> +static inline int wait_for_engine_reset(struct drm_i915_private *dev_priv,
> +		unsigned int grdom)
> +{

No need to inline

> +#define _CND ((__raw_i915_read32(dev_priv, GEN6_GDRST) & grdom) == 0)
> +
> +	/*
> +	 * Spin waiting for the device to ack the reset request.
> +	 * Times out after 500 us
> +	 * */
> +	return wait_for_atomic_us(_CND, 500);
> +
> +#undef _CND
> +}
> +
> +static int do_engine_reset_nolock(struct intel_engine_cs *engine)
> +{
> +	int ret = -ENODEV;
> +	struct drm_i915_private *dev_priv = engine->dev->dev_private;
> +
> +	assert_spin_locked(&dev_priv->uncore.lock);
> +
> +	switch (engine->id) {
> +	case RCS:
> +		__raw_i915_write32(dev_priv, GEN6_GDRST, GEN6_GRDOM_RENDER);
> +		engine->hangcheck.reset_count++;
> +		ret = wait_for_engine_reset(dev_priv, GEN6_GRDOM_RENDER);
> +		break;
> +
> +	case BCS:
> +		__raw_i915_write32(dev_priv, GEN6_GDRST, GEN6_GRDOM_BLT);
> +		engine->hangcheck.reset_count++;
> +		ret = wait_for_engine_reset(dev_priv, GEN6_GRDOM_BLT);
> +		break;
> +
> +	case VCS:
> +		__raw_i915_write32(dev_priv, GEN6_GDRST, GEN6_GRDOM_MEDIA);
> +		engine->hangcheck.reset_count++;
> +		ret = wait_for_engine_reset(dev_priv, GEN6_GRDOM_MEDIA);
> +		break;
> +
> +	case VECS:
> +		__raw_i915_write32(dev_priv, GEN6_GDRST, GEN6_GRDOM_VECS);
> +		engine->hangcheck.reset_count++;
> +		ret = wait_for_engine_reset(dev_priv, GEN6_GRDOM_VECS);
> +		break;
> +
> +	case VCS2:
> +		__raw_i915_write32(dev_priv, GEN6_GDRST, GEN8_GRDOM_MEDIA2);
> +		engine->hangcheck.reset_count++;
> +		ret = wait_for_engine_reset(dev_priv, GEN8_GRDOM_MEDIA2);
> +		break;
> +
> +	default:
> +		DRM_ERROR("Unexpected engine: %d\n", engine->id);
> +		break;
> +	}

  int mask[NUM_RINGS] = { GEN6_GRDOM_RENDER, GEN6_GDROM_BLT...};

  if (WARN_ON_ONCE(!engine->initialized))
    return;       

  __raw_i915_write(dev_priv, mask[engine->id]);
  engine->hangcheck.reset_count++;
  ret = wait_for_engine_reset(dev_priv, mask[engine->id]);

> +
> +	return ret;
> +}
> +
> +static int gen8_do_engine_reset(struct intel_engine_cs *engine)
> +{
> +	struct drm_device *dev = engine->dev;
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	int ret = -ENODEV;
> +	unsigned long irqflags;
> +
> +	spin_lock_irqsave(&dev_priv->uncore.lock, irqflags);
> +	ret = do_engine_reset_nolock(engine);
> +	spin_unlock_irqrestore(&dev_priv->uncore.lock, irqflags);
> +
> +	if (!ret) {
> +		u32 reset_ctl = 0;
> +
> +		/*
> +		 * Confirm that reset control register back to normal
> +		 * following the reset.
> +		 */
> +		reset_ctl = I915_READ(RING_RESET_CTL(engine->mmio_base));
> +		WARN(reset_ctl & 0x3, "Reset control still active after reset! (0x%08x)\n",
> +			reset_ctl);
> +	} else {
> +		DRM_ERROR("Engine reset failed! (%d)\n", ret);
> +	}
> +
> +	return ret;
> +}
> +
> +int intel_gpu_engine_reset(struct intel_engine_cs *engine)
> +{
> +	/* Reset an individual engine */
> +	int ret = -ENODEV;
> +	struct drm_device *dev = engine->dev;
> +
> +	switch (INTEL_INFO(dev)->gen) {

You can pass dev_priv to INTEL_INFO also, and prefer to do so in here
and rest of the code.

> +	case 8:
case 9: ?

Thanks,
-Mika

> +		ret = gen8_do_engine_reset(engine);
> +		break;
> +	default:
> +		DRM_ERROR("Per Engine Reset not supported on Gen%d\n",
> +			  INTEL_INFO(dev)->gen);
> +		break;
> +	}
> +
> +	return ret;
> +}
> +
> +/*
> + * On gen8+ a reset request has to be issued via the reset control register
> + * before a GPU engine can be reset in order to stop the command streamer
> + * and idle the engine. This replaces the legacy way of stopping an engine
> + * by writing to the stop ring bit in the MI_MODE register.
> + */
> +int intel_request_gpu_engine_reset(struct intel_engine_cs *engine)
> +{
> +	/* Request reset for an individual engine */
> +	int ret = -ENODEV;
> +	struct drm_device *dev = engine->dev;
> +
> +	if (INTEL_INFO(dev)->gen >= 8)
> +		ret = gen8_request_engine_reset(engine);
> +	else
> +		DRM_ERROR("Reset request not supported on Gen%d\n",
> +			  INTEL_INFO(dev)->gen);
> +
> +	return ret;
> +}
> +
> +/*
> + * It is possible to back off from a previously issued reset request by simply
> + * clearing the reset request bit in the reset control register.
> + */
> +int intel_unrequest_gpu_engine_reset(struct intel_engine_cs *engine)
> +{
> +	/* Roll back reset request for an individual engine */
> +	int ret = -ENODEV;
> +	struct drm_device *dev = engine->dev;
> +
> +	if (INTEL_INFO(dev)->gen >= 8)
> +		ret = gen8_unrequest_engine_reset(engine);
> +	else
> +		DRM_ERROR("Reset unrequest not supported on Gen%d\n",
> +			  INTEL_INFO(dev)->gen);
> +
> +	return ret;
> +}
> +
>  bool intel_uncore_unclaimed_mmio(struct drm_i915_private *dev_priv)
>  {
>  	return check_for_unclaimed_mmio(dev_priv);
> -- 
> 1.9.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx