Re: [PATCH v7 02/20] drm/i915: Modify error handler for per engine hang recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 5/12/2017 2:09 PM, Chris Wilson wrote:
On Fri, May 12, 2017 at 01:55:11PM -0700, Michel Thierry wrote:
On 5/8/2017 11:31 AM, Michel Thierry wrote:
On 4/29/2017 7:19 AM, Chris Wilson wrote:
On Thu, Apr 27, 2017 at 04:12:42PM -0700, Michel Thierry wrote:
From: Arun Siluvery <arun.siluvery@xxxxxxxxxxxxxxx>

...
+    }
+
    intel_prepare_reset(dev_priv);

    set_bit(I915_RESET_HANDOFF, &dev_priv->gpu_error.flags);
@@ -2680,6 +2706,7 @@ static void i915_reset_and_wakeup(struct
drm_i915_private *dev_priv)
        kobject_uevent_env(kobj,
                   KOBJ_CHANGE, reset_done_event);

+finish:
    /*
     * Note: The wake_up also serves as a memory barrier so that
     * waiters see the updated value of the dev_priv->gpu_error.
@@ -2781,7 +2808,7 @@ void i915_handle_error(struct drm_i915_private
*dev_priv,
                 &dev_priv->gpu_error.flags))
        goto out;

-    i915_reset_and_wakeup(dev_priv);
+    i915_reset_and_wakeup(dev_priv, engine_mask);

? You don't need to wakeup the struct_mutex so we don't need this after
per-engine resets. Time to split up i915_reset_and_wakeup(), because we
certainly shouldn't be calling intel_finish_reset() without first calling
intel_prepare_reset(). Which is right here in my tree...


Looking at your tree, it wouldn't call finish_reset there either, only
these two are called after a successful reset:

finish:
   clear_bit(I915_RESET_BACKOFF, &dev_priv->gpu_error.flags);
   wake_up_all(&dev_priv->gpu_error.reset_queue);

But you're right, we only need to clear the error flag, no need to call
wake_up_all.

Should I move the per-engine reset to i915_handle_error, and then leave
i915_reset_and_wakeup just for full resets?
That would also make the promotion from per-engine to global look a bit
'clearer'.


I just noticed an issue if I don't call wake_up_all. There can be
someone else waiting for the reset to complete
(i915_mutex_lock_interruptible -> i915_gem_wait_for_error).

I915_RESET_BACKOFF has/had 2 roles, stop any other user to grab the
struct mutex (which we won't need in reset-engine) and prevent two
concurrent reset attempts (which we still want). Time to add a new
flag for the later? (I915_RESET_ENGINE_IN_PROGRESS?)

Yes, that would be a good idea to avoid dual purposing the bits. Now
that we do direct resets along the wait path, we can completely drop the
i915_mutex_interruptible(). (No one else should be holding the mutex
indefinitely.) I think that's a better approach -- I think we've already
moved all the EIO magic aware to the ABI points where we deemed it
necessary.

And it seems to work ok with the new flag and no wake_up. I'll run more tests.

Thanks
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux