On 2019-10-24 at 07:14:56 +0100, Chris Wilson wrote: > Quoting Ramalingam C (2019-10-24 02:32:01) > > On 2019-10-23 at 14:31:04 +0100, Chris Wilson wrote: > > > If we are doing a normal GPU reset triggered after detecting a long > > > period of stalled work, we can take our time and allow the engines to > > > quiesce. Since we've stopped submission to the engine, and if we wait > > > long enough an innocent context should complete, leaving the engine idle. > > > So by waiting a short amount of time, we should prevent clobbering other > > > users when resetting a stuck context. > > > > > > Suggested-by: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > > > Suggested-by: Jon Bloomfield <jon.bloomfield@xxxxxxxxx> > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > Cc: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > > > Cc: Joonas Lahtinen <joonas.lahtinen@xxxxxxxxxxxxxxx> > > > Reviewed-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > > > --- > > > drivers/gpu/drm/i915/Kconfig.profile | 11 +++++++++++ > > > drivers/gpu/drm/i915/gt/intel_engine_cs.c | 20 +++++++++++++++++++- > > > drivers/gpu/drm/i915/gt/intel_engine_types.h | 4 ++++ > > > 3 files changed, 34 insertions(+), 1 deletion(-) > > > > > > diff --git a/drivers/gpu/drm/i915/Kconfig.profile b/drivers/gpu/drm/i915/Kconfig.profile > > > index 48df8889a88a..3a3881d5e44b 100644 > > > --- a/drivers/gpu/drm/i915/Kconfig.profile > > > +++ b/drivers/gpu/drm/i915/Kconfig.profile > > > @@ -25,3 +25,14 @@ config DRM_I915_SPIN_REQUEST > > > May be 0 to disable the initial spin. In practice, we estimate > > > the cost of enabling the interrupt (if currently disabled) to be > > > a few microseconds. > > > + > > > +config DRM_I915_STOP_TIMEOUT > > > + int "How long to wait for an engine to quiesce gracefully before reset (ms)" > > > + default 100 # milliseconds > > > + help > > > + By stopping submission and sleeping for a short time before resetting > > > + the GPU, we allow the innocent contexts also on the system to quiesce. > > > + It is then less likely for a hanging context to cause collateral > > > + damage as the system is reset in order to recover. The corollary is > > > + that the reset itself may take longer and so be more disruptive to > > > + interactive or low latency workloads. > > > diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > > > index 0e20713603ec..e4203eb44139 100644 > > > --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c > > > +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c > > > @@ -308,6 +308,9 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id) > > > engine->instance = info->instance; > > > __sprint_engine_name(engine); > > > > > > + engine->props.stop_timeout_ms = > > > + CONFIG_DRM_I915_STOP_TIMEOUT; > > Compare to previous version where you used the CONFIG variable directly, > > what is the benefit of using it through a variable? So that we could > > alter it in runtime? > > That is what the next series proposes. Also it turns out to be more > easily testable if one is able to alter the timeouts for testing. Thanks for explaining Chris! -Ram. > -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx