On Mon, Aug 28, 2017 at 08:44:48PM +0100, Chris Wilson wrote: > Quoting jeff.mcgee@xxxxxxxxx (2017-08-28 20:25:30) > > From: Jeff McGee <jeff.mcgee@xxxxxxxxx> > > > > If someone else is resetting the engine we should clear our own bit as > > part of skipping that engine. Otherwise we will later believe that it > > has not been reset successfully and then trigger full gpu reset. If the > > other guy's reset actually fails, he will trigger the full gpu reset. > > The reason we did continue on to the global reset was to serialise > i915_handle_error() with the other thread. Not a huge issue, but a > reasonable property to keep -- and we definitely want a to explain why > only one reset at a time is important. > > bool intel_engine_lock_reset() { > if (!test_and_set_bit(I915_RESET_ENGINE + engine->id, > &engine->i915->gpu_error.flags)) > return true; > > intel_engine_wait_for_reset(engine); The current code doesn't wait for the other thread to finish the reset, but this would add that wait. Did you intend that as an additional change to the current code? I don't think it is necessary. Each thread wants to reset some subset of engines, so it seems the thread can safely exit as soon as it knows each of those engines has been reset or is being reset as part of another thread that got the lock first. If any of the threads fail to reset an engine they "own", then full gpu reset is assured. -Jeff > return false; /* somebody else beat us to the reset */ > } > > void intel_engine_wait_for_reset() { > while (test_and_set_bit(I915_RESET_ENGINE + engine->id, > &engine->i915->gpu_error.flags)) > wait_on_bit(&engine->i915->gpu_error.flags, I915_RESET_ENGINE + engine->id, > TASK_UNINTERRUPTIBLE); > } > > It can also be used by selftests/intel_hangcheck.c, so let's refactor > before we have 3 copies. > -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx