Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > Since we use the debugfs to recover the device after modifying the > i915.reset parameter, we need to be sure that we apply the reset and not > piggy-back onto a concurrent one in order for the parameter to take > effect. > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_debugfs.c | 10 +++------- > 1 file changed, 3 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c > index a6fd157b1637..8a488ffc8b7d 100644 > --- a/drivers/gpu/drm/i915/i915_debugfs.c > +++ b/drivers/gpu/drm/i915/i915_debugfs.c > @@ -3874,13 +3874,9 @@ i915_wedged_set(void *data, u64 val) > { > struct drm_i915_private *i915 = data; > > - /* > - * There is no safeguard against this debugfs entry colliding > - * with the hangcheck calling same i915_handle_error() in > - * parallel, causing an explosion. For now we assume that the > - * test harness is responsible enough not to inject gpu hangs > - * while it is writing to 'i915_wedged' > - */ > + /* Flush any previous reset before applying for a new one */ > + wait_event(i915->gpu_error.reset_queue, > + !test_bit(I915_RESET_BACKOFF, &i915->gpu_error.flags)); You removed the comment and yes this makes us wait on our turn to flip the switch. But the hangcheck vs this race still holds. Now even if they would two pile on this switch...there should be no harm as in that case we see two log entries resulting in a one reset. Reviewed-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> -Mika > > i915_handle_error(i915, val, I915_ERROR_CAPTURE, > "Manually set wedged engine mask = %llx", val); > -- > 2.20.1 _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx