Re: [PATCH v2] drm/i915: Taint (TAINT_DIE) the kernel if the GPU reset fails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> -----Original Message-----
> From: Intel-gfx [mailto:intel-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx] On Behalf
> Of Chris Wilson
> Sent: Wednesday, November 29, 2017 4:06 PM
> To: intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> Cc: Daniel Vetter <daniel.vetter@xxxxxxxx>
> Subject:  [PATCH v2] drm/i915: Taint (TAINT_DIE) the kernel if the
> GPU reset fails
> 
> History tells us that if we cannot reset the GPU now, we never will. This then
> impacts everything that is run subsequently. On failing the reset, we mark
> the driver as wedged, trying to prevent further execution on the GPU,
> forcing userspace to fallback to using the CPU to update its framebuffers and
> let the user know what happened.
> 
> We also want to go one step further and add a taint to the kernel so that any
> subsequent faults can be traced back to this failure. This is important for igt,
> where if the GPU/driver fails we want to reboot and restart testing rather
> than continue on into oblivion.

I am OK with this if the comment deciding what IGT wants to do when this state is collected is removed.
I just want to make it clear that the tainting feature is separated from what that should be done when the system is tainted.

> 
> TAINT_DIE is colloquially known as "system on fire", which seems
> appropriate for unresponsive hardware.
> 
> v2: Also taint if the recovery fails (again history shows us that is typically
> fatal).
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=103514
> Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx>
> Cc: Daniel Vetter <daniel.vetter@xxxxxxxx>
> Cc: Michał Winiarski <michal.winiarski@xxxxxxxxx>
> ---
>  drivers/gpu/drm/i915/i915_drv.c | 28 ++++++++++++++++++++++------
>  1 file changed, 22 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c
> b/drivers/gpu/drm/i915/i915_drv.c index 696d5cdf2779..eb90ddac7f8b
> 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -1897,18 +1897,21 @@ void i915_reset(struct drm_i915_private *i915,
> unsigned int flags)
>  	disable_irq(i915->drm.irq);
>  	ret = i915_gem_reset_prepare(i915);
>  	if (ret) {
> -		DRM_ERROR("GPU recovery failed\n");
> +		dev_err(i915->drm.dev, "GPU recovery
> failed\n");
>  		intel_gpu_reset(i915, ALL_ENGINES);
> -		goto error;
> +		goto taint;
>  	}
> 
>  	ret = intel_gpu_reset(i915, ALL_ENGINES);
>  	if (ret) {
> -		if (ret != -ENODEV)
> -			DRM_ERROR("Failed to reset chip:
> %i\n", ret);
> -		else
> +		if (ret != -ENODEV) {
> +			dev_err(i915->drm.dev,
> +				"Failed to reset
> chip: %i\n", ret);
> +			goto taint;
> +		} else {
>  			DRM_DEBUG_DRIVER("GPU reset
> disabled\n");
> -		goto error;
> +			goto error;
> +		}
>  	}
> 
>  	i915_gem_reset(i915);
> @@ -1951,6 +1954,19 @@ void i915_reset(struct drm_i915_private *i915,
> unsigned int flags)
>  	wake_up_bit(&error->flags, I915_RESET_HANDOFF);
>  	return;
> 
> +taint:
> +	/*
> +	 * History tells us that if we cannot reset the GPU now, we
> +	 * never will. This then impacts everything that is run
> +	 * subsequently. On failing the reset, we mark the driver
> +	 * as wedged, preventing further execution on the GPU.
> +	 * We also want to go one step further and add a taint to the
> +	 * kernel so that any subsequent faults can be traced back to
> +	 * this failure. This is important for igt, where if the
> +	 * GPU/driver fails we want to reboot and restart testing
> +	 * rather than continue on into oblivion.
> +	 */
> +	add_taint(TAINT_DIE, LOCKDEP_STILL_OK);
>  error:
>  	i915_gem_set_wedged(i915);
>  	i915_gem_retire_requests(i915);
> --
> 2.15.0
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux