Re: [PATCH 1/2] drm/i915: ban badly behaving contexts

Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> · Fri, 6 Sep 2013 10:18:19 +0100

On Fri, Aug 30, 2013 at 04:19:28PM +0300, Mika Kuoppala wrote:
> Now when we have mechanism in place to track which context
> was guilty of hanging the gpu, it is possible to punish
> for bad behaviour.
> 
> If context has recently submitted a faulty batchbuffers guilty of
> gpu hang and submits another batch which hangs gpu in quick
> succession, ban it permanently. If ctx is banned, no more
> batchbuffers will be queued for execution.
> 
> There is no need for global wedge machinery anymore and
> it would be unwise to wedge the whole gpu if we have multiple
> hanging batches queued for execution. Instead just ban
> the guilty ones and carry on.
> 
> v2: Store guilty ban status bool in gpu_error instead of pointers
>     that might become danling before hang is declared.
> 
> v3: Use return value for banned status instead of stashing state
>     into gpu_error (Chris Wilson)
> 
> v4: - rebase on top of fixed hang stats api
>     - add define for ban period
>     - rename commit and improve commit msg
> 
> v5: - rely context banning instead of wedging the gpu
>     - beautification and fix for ban calculation (Chris)
> 
> Signed-off-by: Mika Kuoppala <mika.kuoppala@xxxxxxxxx>

I like this a lot. It is a big step away from our global policy and
makes the banning easier to comprehend.

Reviewed-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@xxxxxxxxxxxxxxxxxxxxx
http://lists.freedesktop.org/mailman/listinfo/intel-gfx