Re: [PATCH v9 05/21] drm/i915: Add support for per engine reset recovery

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Mon, 19 Jun 2017 13:31:04 +0100



Quoting Michel Thierry (2017-06-15 21:18:12)
>  int i915_gem_reset_prepare(struct drm_i915_private *dev_priv)
>  {
>         struct intel_engine_cs *engine;
> +       struct drm_i915_gem_request *request;
>         enum intel_engine_id id;
>         int err = 0;
>  
> -       /* Ensure irq handler finishes, and not run again. */
>         for_each_engine(engine, dev_priv, id) {
> -               struct drm_i915_gem_request *request = NULL;
> -
> -               /* Prevent the signaler thread from updating the request
> -                * state (by calling dma_fence_signal) as we are processing
> -                * the reset. The write from the GPU of the seqno is
> -                * asynchronous and the signaler thread may see a different
> -                * value to us and declare the request complete, even though
> -                * the reset routine have picked that request as the active
> -                * (incomplete) request. This conflict is not handled
> -                * gracefully!
> -                */
> -               kthread_park(engine->breadcrumbs.signaler);
> -
> -               /* Prevent request submission to the hardware until we have
> -                * completed the reset in i915_gem_reset_finish(). If a request
> -                * is completed by one engine, it may then queue a request
> -                * to a second via its engine->irq_tasklet *just* as we are
> -                * calling engine->init_hw() and also writing the ELSP.
> -                * Turning off the engine->irq_tasklet until the reset is over
> -                * prevents the race.
> -                */
> -               tasklet_kill(&engine->irq_tasklet);
> -               tasklet_disable(&engine->irq_tasklet);
> -
> -               if (engine->irq_seqno_barrier)
> -                       engine->irq_seqno_barrier(engine);
> -
> -               if (engine_stalled(engine)) {
> -                       request = i915_gem_find_active_request(engine);
> -                       if (request && request->fence.error == -EIO)
> -                               err = -EIO; /* Previous reset failed! */
> +               request = i915_gem_reset_prepare_engine(engine);
> +               if (IS_ERR(request)) {
> +                       err = PTR_ERR(request);
> +                       break;

s/break/continue/

Otherwise, prepare/finish are unbalanced leading to tasklets being very
confused.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx