On Tue, Nov 15, 2016 at 04:36:33PM +0200, Mika Kuoppala wrote: > As hangcheck score was removed, the active decay of score > was removed also. This removed feature for hangcheck to detect > if the gpu client was accidentally or maliciously causing intermittent > hangs. Reinstate the scoring as a per context property, so that if > one context starts to act unfavourably, ban it. > > v2: ban_period_secs as a gate to score check (Chris) > > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Signed-off-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxx> > - elapsed = get_seconds() - ctx->hang_stats.guilty_ts; > - if (ctx->hang_stats.ban_period_seconds && > - elapsed <= ctx->hang_stats.ban_period_seconds) { > + if (!hs->ban_period_seconds) > + return false; > + > + elapsed = get_seconds() - hs->guilty_ts; > + if (elapsed <= hs->ban_period_seconds) { > DRM_DEBUG("context hanging too fast, banning!\n"); > return true; > } > > + if (hs->ban_score >= 40) { > + DRM_DEBUG("context hanging too often, banning!\n"); > + return true; > + } > + > return false; > } > + hs->ban_score += 10; This pair should be tunables (i.e. a macro somewhere sensible). > diff --git a/drivers/gpu/drm/i915/i915_gem_request.c b/drivers/gpu/drm/i915/i915_gem_request.c > index b9b5253..095c809 100644 > --- a/drivers/gpu/drm/i915/i915_gem_request.c > +++ b/drivers/gpu/drm/i915/i915_gem_request.c > @@ -204,6 +204,10 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request) > > trace_i915_gem_request_retire(request); > > + /* Retirement decays the ban score as it is a sign of ctx progress */ > + if (request->ctx->hang_stats.ban_score > 0) > + request->ctx->hang_stats.ban_score--; Please put this along with the other request->ctx updates (i.e. after request->previos_context and before the context_put). Otherwise lgtm. -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx