Re: [PATCH 3/5] drm/i915/guc: Use correct error code for GuC initialization failure

"Michal Wajdeczko" <michal.wajdeczko@xxxxxxxxx> · Thu, 01 Feb 2018 15:23:14 +0100

On Thu, 01 Feb 2018 13:35:15 +0100, Chris Wilson  
<chris@xxxxxxxxxxxxxxxxxx> wrote:

Quoting Sagar Arun Kamble (2018-02-01 11:48:54)

On 1/31/2018 11:02 PM, Michal Wajdeczko wrote:
> Since commit 6ca9a2beb5 ("drm/i915: Unwind i915_gem_init() failure")
> we believed that we correctly handle all errors encountered during
> GuC initialization, including special one that indicates request to
> run driver with disabled GPU submission (-EIO).
>
> Unfortunately since commit 121981fafe ("drm/i915/guc: Combine
> enable_guc_loading|submission modparams") we stopped using that
> error code to avoid unwanted fallback to execlist submission mode.
>
> In result any GuC initialization failure was treated as  
non-recoverable
> error leading to driver load abort, so we could not even read related
> GuC error log to investigate cause of the problem.
>
> Fix that by always returning -EIO on uC hardware related failure.
>
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@xxxxxxxxx>
> Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> Cc: Michal Winiarski <michal.winiarski@xxxxxxxxx>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>
> Cc: Sagar Arun Kamble <sagar.a.kamble@xxxxxxxxx>
> ---
>   drivers/gpu/drm/i915/intel_uc.c | 5 +----
>   1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_uc.c  
b/drivers/gpu/drm/i915/intel_uc.c
> index 3a13cbb..1547e16 100644
> --- a/drivers/gpu/drm/i915/intel_uc.c
> +++ b/drivers/gpu/drm/i915/intel_uc.c
> @@ -432,11 +432,8 @@ int intel_uc_init_hw(struct drm_i915_private  
*dev_priv)
>        * Note that there is no fallback as either user explicitly  
asked for
>        * the GuC or driver default option was to run with the GuC  
enabled.
>        */
> -     if (GEM_WARN_ON(ret == -EIO))
> -             ret = -EINVAL;
> -
>       dev_err(dev_priv->drm.dev, "GuC initialization failed %d\n",  
ret);
> -     return ret;
> +     return -EIO; /* We want to disable GPU submission but keep KMS  
alive */
>   }
We'll need to avoid uc_fini_hw/ in unload path since GPU will be marked
wedged after this?

We should be avoiding uc_fini_hw in unload because we haven't completed
uc_init_hw. Such failure should already be handled?

It's even worst. In case of uc_init_hw() failure we always call uc_fini()
and uc_fini_misc() regardless of EIO, and then in case of running
in wedged more we will try to call them again in unload path ...

While it is easy to postpone uc_fini[_misc] calls to unload stage,
it is tricky to avoid calling uc_fini_hw there without adding extra check
(earlier there was discussion that we should not have any conditions
in _fini functions)

Any ideas how to solve that ?

Michal
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx