Re: [PATCH v2 12/12] drm/panfrost: Shorten the fence signalling section

Steven Price <steven.price@xxxxxxx> · Mon, 21 Jun 2021 16:43:11 +0100

On 21/06/2021 14:39, Boris Brezillon wrote:
> panfrost_reset() does not directly signal fences, but
> panfrost_scheduler_start() does, when calling drm_sched_start().

I have to admit to not fully understanding dma_fence_begin_signalling()
- but I thought the idea was that it should have a relatively wide
length to actually catch locking bugs. Just wrapping drm_sched_start()
looks wrong: i.e. why isn't this code just contained in drm_sched_start()?

The relevant section from the docs reads:

>  * * All code necessary to complete a &dma_fence must be annotated, from the
>  *   point where a fence is accessible to other threads, to the point where
>  *   dma_fence_signal() is called. Un-annotated code can contain deadlock issues,
>  *   and due to the very strict rules and many corner cases it is infeasible to
>  *   catch these just with review or normal stress testing

So it makes sense that we annotate code from when the reset is started
to after the signalling has happened. That way if there are any locks
taken in the reset path which could be blocked waiting for any of the
fences which might be signalled we get moaned at by lockdep.

Steve

> Signed-off-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx>
> ---
>  drivers/gpu/drm/panfrost/panfrost_job.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> index 74b63e1ee6d9..cf6abe0fdf47 100644
> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> @@ -414,6 +414,7 @@ static bool panfrost_scheduler_stop(struct panfrost_queue_state *queue,
>  static void panfrost_scheduler_start(struct panfrost_queue_state *queue)
>  {
>  	enum panfrost_queue_status old_status;
> +	bool cookie;
>  
>  	mutex_lock(&queue->lock);
>  	old_status = atomic_xchg(&queue->status,
> @@ -423,7 +424,9 @@ static void panfrost_scheduler_start(struct panfrost_queue_state *queue)
>  	/* Restore the original timeout before starting the scheduler. */
>  	queue->sched.timeout = msecs_to_jiffies(JOB_TIMEOUT_MS);
>  	drm_sched_resubmit_jobs(&queue->sched);
> +	cookie = dma_fence_begin_signalling();
>  	drm_sched_start(&queue->sched, true);
> +	dma_fence_end_signalling(cookie);
>  	old_status = atomic_xchg(&queue->status,
>  				 PANFROST_QUEUE_STATUS_ACTIVE);
>  	if (old_status == PANFROST_QUEUE_STATUS_FAULT_PENDING)
> @@ -566,9 +569,7 @@ static void panfrost_reset(struct work_struct *work)
>  						     reset.work);
>  	unsigned long flags;
>  	unsigned int i;
> -	bool cookie;
>  
> -	cookie = dma_fence_begin_signalling();
>  	for (i = 0; i < NUM_JOB_SLOTS; i++) {
>  		/*
>  		 * We want pending timeouts to be handled before we attempt
> @@ -608,8 +609,6 @@ static void panfrost_reset(struct work_struct *work)
>  
>  	for (i = 0; i < NUM_JOB_SLOTS; i++)
>  		panfrost_scheduler_start(&pfdev->js->queue[i]);
> -
> -	dma_fence_end_signalling(cookie);
>  }
>  
>  int panfrost_job_init(struct panfrost_device *pfdev)
>