Re: [PATCH] drm/i915/gt: Restart the heartbeat timer when forcing a pulse

Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx> · Tue, 13 Feb 2024 01:29:37 +0100

Hi John,

On Wed, Jan 10, 2024 at 01:02:16PM -0800, John.C.Harrison@xxxxxxxxx wrote:
> From: John Harrison <John.C.Harrison@xxxxxxxxx>
> 
> The context persistence code does things like send super high priority
> heartbeat pulses to ensure any leaked context can still be pre-empted
> and thus isn't a total denial of service but only a minor denial of
> service. Unfortunately, it wasn't bothering to restart the heatbeat

/heatbeat/heartbeat/

> worker with a fresh timeout. Thus, if a persistent context happened to
> be closed just before the heartbeat was going to go ping anyway then
> the forced pulse would get a negligble execution time. And as the
> forced pulse is super high priority, the worker thread's next step is
> a reset. Which means a potentially innocent system randomly goes boom
> when attempting to close a context. So, force a re-schedule of the
> worker thread with the appropriate timeout.
> 
> Signed-off-by: John Harrison <John.C.Harrison@xxxxxxxxx>
> ---
>  drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> index 1a8e2b7db0138..4ae2fa0b61dd4 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
> @@ -290,6 +290,9 @@ static int __intel_engine_pulse(struct intel_engine_cs *engine)
>  	heartbeat_commit(rq, &attr);
>  	GEM_BUG_ON(rq->sched.attr.priority < I915_PRIORITY_BARRIER);
>  
> +	/* Ensure the forced pulse gets a full period to execute */
> +	next_heartbeat(engine);

I think it makes sense to have this extra heardbeat here and,
as I've been mulling over it, I don't any harm.

The failure doesn't look related, either.

Reviewed-by: Andi Shyti <andi.shyti@xxxxxxxxxxxxxxx>

Thanks,
Andi

>  	return 0;
>  }
>  
> -- 
> 2.43.0