Quoting Mika Kuoppala (2018-06-15 11:18:28) > If client is smart or lucky enough to create a new context > after each hang, our context banning mechanism will never > catch up, and as a result of that it will be saved from > client banning. This can result in a never ending streak of > gpu hangs caused by bad or malicious client, preventing > access from other legit gpu clients. > > Fix this by always incrementing per client ban score if > it hangs in short successions regardless of context ban > scoring. The exception are non bannable contexts. They remain > detached from client ban scoring mechanism. > > v2: xchg timestamp, tidyup (Chris) > > Fixes: b083a0870c79 ("drm/i915: Add per client max context ban limit") > Cc: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Signed-off-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxxxxxxxx> > --- > drivers/gpu/drm/i915/i915_drv.h | 20 ++++++--- > drivers/gpu/drm/i915/i915_gem.c | 57 +++++++++++++++++-------- > drivers/gpu/drm/i915/i915_gem_context.c | 2 +- > 3 files changed, 54 insertions(+), 25 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h > index 74dd88d8563e..93aa8e7dfaba 100644 > --- a/drivers/gpu/drm/i915/i915_drv.h > +++ b/drivers/gpu/drm/i915/i915_drv.h > @@ -352,14 +352,20 @@ struct drm_i915_file_private { > > unsigned int bsd_engine; > > -/* Client can have a maximum of 3 contexts banned before > - * it is denied of creating new contexts. As one context > - * ban needs 4 consecutive hangs, and more if there is > - * progress in between, this is a last resort stop gap measure > - * to limit the badly behaving clients access to gpu. > +/* Every context ban increments per client ban score. Also /* * Every One day we'll have rewritten every line of code, and every comment. > + * hangs in short succession increments ban score. If client suffers 3 > + * context bans, 9 hangs in quick succession or combination of those, Leave out the numbers if possible, just explain the rationale of the multilevel system. > + * it is banned and submitting more work will fail. This is a stop gap > + * measure to limit the badly behaving clients access to gpu. > + * Note that unbannable contexts never increment the client ban score. > */ > -#define I915_MAX_CLIENT_CONTEXT_BANS 3 > - atomic_t context_bans; > +#define I915_CLIENT_SCORE_HANG_FAST 1 > +#define I915_CLIENT_FAST_HANG_JIFFIES (60 * HZ) > +#define I915_CLIENT_SCORE_CONTEXT_BAN 3 > +#define I915_CLIENT_SCORE_BANNED 9 > + /** ban_score: Accumulated score of all ctx bans and fast hangs. */ > + atomic_t ban_score; > + unsigned long hang_timestamp; > }; > > /* Interface history: > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 8dd4d35655af..f06fe1c636e5 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -2941,32 +2941,54 @@ i915_gem_object_pwrite_gtt(struct drm_i915_gem_object *obj, > return 0; > } > > +static void i915_gem_client_mark_guilty(struct drm_i915_file_private *file_priv, > + const struct i915_gem_context *ctx) > +{ > + unsigned int score; > + unsigned long prev_hang; > + > + if (i915_gem_context_is_banned(ctx)) > + score = I915_CLIENT_SCORE_CONTEXT_BAN; > + else > + score = 0; > + > + prev_hang = xchg(&file_priv->hang_timestamp, jiffies); > + if (time_before(jiffies, prev_hang + I915_CLIENT_FAST_HANG_JIFFIES)) > + score += I915_CLIENT_SCORE_HANG_FAST; > + > + if (score) { > + atomic_add(score, &file_priv->ban_score); > + > + DRM_DEBUG_DRIVER("client %s: gained %u ban score, now %u\n", > + ctx->name, score, > + atomic_read(&file_priv->ban_score)); > + } > +} > + > static void i915_gem_context_mark_guilty(struct i915_gem_context *ctx) > { > - bool banned; > + unsigned int score; > + bool banned, bannable; > > atomic_inc(&ctx->guilty_count); > > - banned = false; > - if (i915_gem_context_is_bannable(ctx)) { > - unsigned int score; > + bannable = i915_gem_context_is_bannable(ctx); > + score = atomic_add_return(CONTEXT_SCORE_GUILTY, &ctx->ban_score); > + banned = score >= CONTEXT_SCORE_BAN_THRESHOLD; > > - score = atomic_add_return(CONTEXT_SCORE_GUILTY, > - &ctx->ban_score); > - banned = score >= CONTEXT_SCORE_BAN_THRESHOLD; > + DRM_DEBUG_DRIVER("context %s: guilty %d, score %u, ban %s:%s\n", > + ctx->name, atomic_read(&ctx->guilty_count), > + score, yesno(bannable), yesno(banned)); ban: no:yes Wut? Maybe just yesno(banned && bannable) as even debug messages shouldn't strive to confuse us further. Reviewed-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> -Chris _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/intel-gfx