Re: [PATCH 3/4] drm/i915/uc: Inject load errors into intel_uc_init_hw

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 30 Jul 2019 11:27:51 +0200, Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> wrote:

Quoting Chris Wilson (2019-07-30 09:47:37)
Quoting Michal Wajdeczko (2019-07-29 16:23:00)
> Inject load errors into intel_uc_init_hw to make sure we
> correctly handle uC initialization failures.
>
> To avoid complains from CI about inserted errors or warnings,
> use helper macro that checks if there was an error injection.
>
> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@xxxxxxxxx>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@xxxxxxxxx>
> Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
> ---
>  drivers/gpu/drm/i915/gt/uc/intel_uc.c | 8 ++++++++
>  drivers/gpu/drm/i915/i915_drv.h       | 7 ++++++-
>  drivers/gpu/drm/i915/i915_gem.c       | 2 +-
>  3 files changed, 15 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/uc/intel_uc.c b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> index fafa9be1e12a..9e1156c29cb1 100644
> --- a/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> +++ b/drivers/gpu/drm/i915/gt/uc/intel_uc.c
> @@ -400,6 +400,14 @@ int intel_uc_init_hw(struct intel_uc *uc)
>         if (!intel_uc_is_using_guc(uc))
>                 return 0;
>
> +       ret = i915_inject_load_error(i915, -EIO);
> +       if (ret)
> +               return ret;
> +
> +       ret = i915_inject_load_error(i915, -ENXIO);
> +       if (ret)
> +               return ret;
> +
>         GEM_BUG_ON(!intel_uc_fw_supported(&guc->fw));
>
>         guc_reset_interrupts(guc);
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 6b059d51aaff..36f7a146f06a 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -137,9 +137,14 @@ bool i915_error_injected(void);
>
> #define i915_inject_probe_failure(i915) i915_inject_load_error((i915), -ENODEV)
>
> -#define i915_probe_error(i915, fmt, ...) \
> +#define I915_ERROR(i915, fmt, ...) \
> __i915_printk(i915, i915_error_injected() ? KERN_DEBUG : KERN_ERR, \
>                       fmt, ##__VA_ARGS__)
> +#define I915_WARN(i915, fmt, ...) \
> + __i915_printk(i915, i915_error_injected() ? KERN_DEBUG : KERN_WARNING, \
> +                     fmt, ##__VA_ARGS__)

I didn't see I915_WARN be used in this series. Is it likely? We either
abort the module load, in which it is an error, or we are quite happy to
continue in which case I'd vote for a "normal but significant condition"
i.e. KERN_NOTICE.

Spotted the I915_WARN, I guess that's reasonable.

However, I do dislike I915_ERROR / I915_WARN on principle. Especially as
they have the coupling to the probe-failure magic and I don't feel like
it is a good idea to have that spread too far.

But what else can we do in i915 alone to
a) have messages that show origin of the problem as soon as possible,
b) ignore these fake errors if caused by injected failures ?

otherwise we risk that all new injected errors will be immediately
reported as regressions and we stop at CI.BAT

There are other options but we need some buy-in from the CI team:

- we can use existing messages to temporary ignore errors between
  "injecting" and "failed"

i915 0000:00:02.0: Injecting failure %d at checkpoint %u [file:line]
i915 0000:00:02.0: Device initialization failed (%d)

- we can use existing messages to temporary ignore errors between
  "injecting" and new "completed"

i915 0000:00:02.0: Injecting failure %d at checkpoint %u [file:line]
i915 0000:00:02.0: Checkpoint %u failure recovery completed

btw, I'm also thinking about introducing 'recoverable' failures, which
could be used not only to make sure that we don't break the driver while
unwinding, but also to make sure that driver was finally able handle such
failures in expected way:

i915_inject_load_error(i915, -ENOMEM); // probe must fail
i915_inject_recoverable_error(i915, -EIO, WEDGED); // probe ok but gpu wedged
i915_inject_recoverable_error(i915, -ENOEXEC, NORMAL); // probe must be ok
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [AMD Graphics]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux