On 2/1/2018 7:53 PM, Michal Wajdeczko wrote:
On Thu, 01 Feb 2018 13:35:15 +0100, Chris Wilson
<chris@xxxxxxxxxxxxxxxxxx> wrote:
Quoting Sagar Arun Kamble (2018-02-01 11:48:54)
On 1/31/2018 11:02 PM, Michal Wajdeczko wrote:
> Since commit 6ca9a2beb5 ("drm/i915: Unwind i915_gem_init() failure")
> we believed that we correctly handle all errors encountered during
> GuC initialization, including special one that indicates request to
> run driver with disabled GPU submission (-EIO).
>
> Unfortunately since commit 121981fafe ("drm/i915/guc: Combine
> enable_guc_loading|submission modparams") we stopped using that
> error code to avoid unwanted fallback to execlist submission mode.
>
> In result any GuC initialization failure was treated as
non-recoverable
> error leading to driver load abort, so we could not even read related
> GuC error log to investigate cause of the problem.
>
> Fix that by always returning -EIO on uC hardware related failure.
>
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@xxxxxxxxx>
> Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> Cc: Michal Winiarski <michal.winiarski@xxxxxxxxx>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>
> Cc: Sagar Arun Kamble <sagar.a.kamble@xxxxxxxxx>
> ---
> drivers/gpu/drm/i915/intel_uc.c | 5 +----
> 1 file changed, 1 insertion(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/intel_uc.c
b/drivers/gpu/drm/i915/intel_uc.c
> index 3a13cbb..1547e16 100644
> --- a/drivers/gpu/drm/i915/intel_uc.c
> +++ b/drivers/gpu/drm/i915/intel_uc.c
> @@ -432,11 +432,8 @@ int intel_uc_init_hw(struct drm_i915_private
*dev_priv)
> * Note that there is no fallback as either user explicitly
asked for
> * the GuC or driver default option was to run with the GuC
enabled.
> */
> - if (GEM_WARN_ON(ret == -EIO))
> - ret = -EINVAL;
> -
> dev_err(dev_priv->drm.dev, "GuC initialization failed %d\n",
ret);
> - return ret;
> + return -EIO; /* We want to disable GPU submission but keep
KMS alive */
> }
We'll need to avoid uc_fini_hw/ in unload path since GPU will be marked
wedged after this?
We should be avoiding uc_fini_hw in unload because we haven't completed
uc_init_hw. Such failure should already be handled?
It's even worst. In case of uc_init_hw() failure we always call uc_fini()
and uc_fini_misc() regardless of EIO, and then in case of running
in wedged more we will try to call them again in unload path ...
While it is easy to postpone uc_fini[_misc] calls to unload stage,
it is tricky to avoid calling uc_fini_hw there without adding extra check
(earlier there was discussion that we should not have any conditions
in _fini functions)
Any ideas how to solve that ?
Earlier we could rely on status of enable_guc_loading/submission. Now I
think we need to
have status per uc, uc_misc, uc_hw as many internal steps can't rely on
like pointer check.
Michal
--
Thanks,
Sagar
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx