On 5/8/2017 11:31 AM, Michel Thierry wrote:
On 4/29/2017 7:19 AM, Chris Wilson wrote:
On Thu, Apr 27, 2017 at 04:12:42PM -0700, Michel Thierry wrote:
From: Arun Siluvery <arun.siluvery@xxxxxxxxxxxxxxx>
...
+ }
+
intel_prepare_reset(dev_priv);
set_bit(I915_RESET_HANDOFF, &dev_priv->gpu_error.flags);
@@ -2680,6 +2706,7 @@ static void i915_reset_and_wakeup(struct
drm_i915_private *dev_priv)
kobject_uevent_env(kobj,
KOBJ_CHANGE, reset_done_event);
+finish:
/*
* Note: The wake_up also serves as a memory barrier so that
* waiters see the updated value of the dev_priv->gpu_error.
@@ -2781,7 +2808,7 @@ void i915_handle_error(struct drm_i915_private
*dev_priv,
&dev_priv->gpu_error.flags))
goto out;
- i915_reset_and_wakeup(dev_priv);
+ i915_reset_and_wakeup(dev_priv, engine_mask);
? You don't need to wakeup the struct_mutex so we don't need this after
per-engine resets. Time to split up i915_reset_and_wakeup(), because we
certainly shouldn't be calling intel_finish_reset() without first calling
intel_prepare_reset(). Which is right here in my tree...
Looking at your tree, it wouldn't call finish_reset there either, only
these two are called after a successful reset:
finish:
clear_bit(I915_RESET_BACKOFF, &dev_priv->gpu_error.flags);
wake_up_all(&dev_priv->gpu_error.reset_queue);
But you're right, we only need to clear the error flag, no need to call
wake_up_all.
Should I move the per-engine reset to i915_handle_error, and then leave
i915_reset_and_wakeup just for full resets?
That would also make the promotion from per-engine to global look a bit
'clearer'.
I just noticed an issue if I don't call wake_up_all. There can be
someone else waiting for the reset to complete
(i915_mutex_lock_interruptible -> i915_gem_wait_for_error).
I915_RESET_BACKOFF has/had 2 roles, stop any other user to grab the
struct mutex (which we won't need in reset-engine) and prevent two
concurrent reset attempts (which we still want). Time to add a new flag
for the later? (I915_RESET_ENGINE_IN_PROGRESS?)
Here's an example without calling wake_up_all (10s timeout):
[ 126.816054] [drm:i915_reset_engine [i915]] resetting rcs0
...
[ 137.499910] [IGT] gem_ringfill: exiting, ret=0
Compared to the one that does,
[ 69.799519] [drm:i915_reset_engine [i915]] resetting rcs0
...
[ 69.801335] [IGT] gem_tdr: exiting, ret=0
Thanks,
-Michel
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx