Re: [PATCH] drm/sched: Declare entity idle only after HW submission

Lucas Stach <l.stach@xxxxxxxxxxxxxx> · Mon, 28 Jun 2021 11:46:08 +0200

Am Donnerstag, dem 24.06.2021 um 16:08 +0200 schrieb Boris Brezillon:
> The panfrost driver tries to kill in-flight jobs on FD close after
> destroying the FD scheduler entities. For this to work properly, we
> need to make sure the jobs popped from the scheduler entities have
> been queued at the HW level before declaring the entity idle, otherwise
> we might iterate over a list that doesn't contain those jobs.
> 
> Suggested-by: Lucas Stach <l.stach@xxxxxxxxxxxxxx>
> Signed-off-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx>
> Cc: Lucas Stach <l.stach@xxxxxxxxxxxxxx>

Not sure how much it's worth to review my own suggestion, but the
implementation looks correct to me.
I don't see any downsides for the existing drivers and it solves the
race window for drivers that want to cancel jobs on the HW submission
queue, without introducing yet another synchronization point.

Reviewed-by: Lucas Stach <l.stach@xxxxxxxxxxxxxx>

> ---
>  drivers/gpu/drm/scheduler/sched_main.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 81496ae2602e..aa776ebe326a 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -811,10 +811,10 @@ static int drm_sched_main(void *param)
>  
>  		sched_job = drm_sched_entity_pop_job(entity);
>  
> -		complete(&entity->entity_idle);
> -
> -		if (!sched_job)
> +		if (!sched_job) {
> +			complete(&entity->entity_idle);
>  			continue;
> +		}
>  
>  		s_fence = sched_job->s_fence;
>  
> @@ -823,6 +823,7 @@ static int drm_sched_main(void *param)
>  
>  		trace_drm_run_job(sched_job, entity);
>  		fence = sched->ops->run_job(sched_job);
> +		complete(&entity->entity_idle);
>  		drm_sched_fence_scheduled(s_fence);
>  
>  		if (!IS_ERR_OR_NULL(fence)) {