On 6/2/2017 1:16 PM, Chris Wilson wrote:
Quoting Michel Thierry (2017-05-22 18:46:24)
+ /* try engine reset first, and continue if fails */
/* Please use sentences when convenient. It looks much neater that way. */
_less_ broken English:
/*
* Try engine reset when available. We fall back to full reset if
* single reset fails.
*/
+ if (intel_has_reset_engine(dev_priv)) {
+ struct intel_engine_cs *engine;
+ unsigned int tmp;
+
+ /* protect against concurrent reset attempts */
+ if (test_and_set_bit(I915_RESET_ENGINE_IN_PROGRESS,
+ &dev_priv->gpu_error.flags))
+ goto out;
+
+ for_each_engine_masked(engine, dev_priv, engine_mask, tmp) {
+ if (i915_reset_engine(engine) == 0)
+ engine_mask &= ~intel_engine_flag(engine);
+ }
+
+ /* clear unconditionally, full reset won't care about it */
+ clear_bit(I915_RESET_ENGINE_IN_PROGRESS,
+ &dev_priv->gpu_error.flags);
+
+ if (!engine_mask)
+ goto out;
+ }
+
+ /* full reset needs the mutex, stop any other user trying to do so */
if (test_and_set_bit(I915_RESET_BACKOFF,
&dev_priv->gpu_error.flags))
We have a big gaping hole here in coordination between a global reset
and per-engine resets.
I think you want to do something like:
if (intel_has_engine_reset()) {
for_each_engine_mask() {
if (test_and_set_bit(I915_RESET_ENGINE + engine->id))
continue;
if (i915_reset_engine() == 0)
engine_mask &= ~engine->mask;
clear_bit(I915_RESET_ENGINE + engine->id);
wake_up_bit(I915_RESET_ENGINE + engine->id);
}
}
if (!engine_mask)
goto out;
if (test_and_set_bit(I915_RESET_BACKOFF))
goto out;
for_each_engine() {
wait_on_bit(I915_RESET_ENGINE + engine->id);
set_bit(I915_RESET_ENGINE);
}
...do global reset...
for_each_engine() {
clear_bit(I915_RESET_ENGINE);
}
The idea is that any per-engine reset that comes in whilst global is in
progress can skip, and that the global waits for any per-engine reset to
finish before clobbering state. The window in which global reset
completes and a new hang is detected before we clear the bits, I am
judiciously ignoring. Also this should allow us to perform parallel
resets between engines. (Now that's a selling point!)
Fair enough, I can change it to support parallel resets.
One thing I forgot to ask, what should I do with the error/reset
uevents? As it is, we will only tell userspace in case of global reset.
Thanks
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/intel-gfx