Re: [PATCH 3/5] drm/i915/guc: Use correct error code for GuC initialization failure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 2/1/2018 7:53 PM, Michal Wajdeczko wrote:
On Thu, 01 Feb 2018 13:35:15 +0100, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote:

Quoting Sagar Arun Kamble (2018-02-01 11:48:54)


On 1/31/2018 11:02 PM, Michal Wajdeczko wrote:
> Since commit 6ca9a2beb5 ("drm/i915: Unwind i915_gem_init() failure")
> we believed that we correctly handle all errors encountered during
> GuC initialization, including special one that indicates request to
> run driver with disabled GPU submission (-EIO).
>
> Unfortunately since commit 121981fafe ("drm/i915/guc: Combine
> enable_guc_loading|submission modparams") we stopped using that
> error code to avoid unwanted fallback to execlist submission mode.
>
> In result any GuC initialization failure was treated as non-recoverable
> error leading to driver load abort, so we could not even read related
> GuC error log to investigate cause of the problem.
>
> Fix that by always returning -EIO on uC hardware related failure.
>
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@xxxxxxxxx>
> Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> Cc: Michal Winiarski <michal.winiarski@xxxxxxxxx>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>
> Cc: Sagar Arun Kamble <sagar.a.kamble@xxxxxxxxx>
> ---
>   drivers/gpu/drm/i915/intel_uc.c | 5 +----
>   1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_uc.c b/drivers/gpu/drm/i915/intel_uc.c
> index 3a13cbb..1547e16 100644
> --- a/drivers/gpu/drm/i915/intel_uc.c
> +++ b/drivers/gpu/drm/i915/intel_uc.c
> @@ -432,11 +432,8 @@ int intel_uc_init_hw(struct drm_i915_private *dev_priv) >        * Note that there is no fallback as either user explicitly asked for >        * the GuC or driver default option was to run with the GuC enabled.
>        */
> -     if (GEM_WARN_ON(ret == -EIO))
> -             ret = -EINVAL;
> -
>       dev_err(dev_priv->drm.dev, "GuC initialization failed %d\n", ret);
> -     return ret;
> +     return -EIO; /* We want to disable GPU submission but keep KMS alive */
>   }
We'll need to avoid uc_fini_hw/ in unload path since GPU will be marked
wedged after this?

We should be avoiding uc_fini_hw in unload because we haven't completed
uc_init_hw. Such failure should already be handled?

It's even worst. In case of uc_init_hw() failure we always call uc_fini()
and uc_fini_misc() regardless of EIO, and then in case of running
in wedged more we will try to call them again in unload path ...

While it is easy to postpone uc_fini[_misc] calls to unload stage,
it is tricky to avoid calling uc_fini_hw there without adding extra check
(earlier there was discussion that we should not have any conditions
in _fini functions)

Any ideas how to solve that ?

Earlier we could rely on status of enable_guc_loading/submission. Now I think we need to have status per uc, uc_misc, uc_hw as many internal steps can't rely on like pointer check.
Michal

--
Thanks,
Sagar

_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux