Quoting Mika Kuoppala (2019-02-08 09:56:59) > Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> writes: > > > Since we use the debugfs to recover the device after modifying the > > i915.reset parameter, we need to be sure that we apply the reset and not > > piggy-back onto a concurrent one in order for the parameter to take > > effect. > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > --- > > drivers/gpu/drm/i915/i915_debugfs.c | 10 +++------- > > 1 file changed, 3 insertions(+), 7 deletions(-) > > > > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c > > index a6fd157b1637..8a488ffc8b7d 100644 > > --- a/drivers/gpu/drm/i915/i915_debugfs.c > > +++ b/drivers/gpu/drm/i915/i915_debugfs.c > > @@ -3874,13 +3874,9 @@ i915_wedged_set(void *data, u64 val) > > { > > struct drm_i915_private *i915 = data; > > > > - /* > > - * There is no safeguard against this debugfs entry colliding > > - * with the hangcheck calling same i915_handle_error() in > > - * parallel, causing an explosion. For now we assume that the > > - * test harness is responsible enough not to inject gpu hangs > > - * while it is writing to 'i915_wedged' > > - */ > > + /* Flush any previous reset before applying for a new one */ > > + wait_event(i915->gpu_error.reset_queue, > > + !test_bit(I915_RESET_BACKOFF, &i915->gpu_error.flags)); > > You removed the comment and yes this makes us wait on our turn > to flip the switch. But the hangcheck vs this race still holds. Concurrent resets have been safe for yonks... But what I realised about our piggy-backing of the 2 resets into one meant that if the value of the i915.reset modparam changed we didn't run with the updated value. -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx