Re: [PATCH v3 10/15] drm/panfrost: Make sure job interrupts are masked before resetting

Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> · Fri, 25 Jun 2021 18:02:07 +0200

On Fri, 25 Jun 2021 16:55:12 +0100
Steven Price <steven.price@xxxxxxx> wrote:

> On 25/06/2021 14:33, Boris Brezillon wrote:
> > This is not yet needed because we let active jobs be killed during by
> > the reset and we don't really bother making sure they can be restarted.
> > But once we start adding soft-stop support, controlling when we deal
> > with the remaining interrrupts and making sure those are handled before
> > the reset is issued gets tricky if we keep job interrupts active.
> > 
> > Let's prepare for that and mask+flush job IRQs before issuing a reset.
> > 
> > Signed-off-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx>
> > ---
> >  drivers/gpu/drm/panfrost/panfrost_job.c | 21 +++++++++++++++------
> >  1 file changed, 15 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
> > index 88d34fd781e8..0566e2f7e84a 100644
> > --- a/drivers/gpu/drm/panfrost/panfrost_job.c
> > +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
> > @@ -34,6 +34,7 @@ struct panfrost_queue_state {
> >  struct panfrost_job_slot {
> >  	struct panfrost_queue_state queue[NUM_JOB_SLOTS];
> >  	spinlock_t job_lock;
> > +	int irq;
> >  };
> >  
> >  static struct panfrost_job *
> > @@ -400,7 +401,15 @@ static void panfrost_reset(struct panfrost_device *pfdev,
> >  	if (bad)
> >  		drm_sched_increase_karma(bad);
> >  
> > -	spin_lock(&pfdev->js->job_lock);  
> 
> I'm not sure it's safe to remove this lock as this protects the
> pfdev->jobs array: I can't see what would prevent panfrost_job_close()
> running at the same time without the lock. Am I missing something?

Ah, you're right, I'll add it back.

> 
> > +	/* Mask job interrupts and synchronize to make sure we won't be
> > +	 * interrupted during our reset.
> > +	 */
> > +	job_write(pfdev, JOB_INT_MASK, 0);
> > +	synchronize_irq(pfdev->js->irq);
> > +
> > +	/* Schedulers are stopped and interrupts are masked+flushed, we don't
> > +	 * need to protect the 'evict unfinished jobs' lock with the job_lock.
> > +	 */
> >  	for (i = 0; i < NUM_JOB_SLOTS; i++) {
> >  		if (pfdev->jobs[i]) {
> >  			pm_runtime_put_noidle(pfdev->dev);
> > @@ -408,7 +417,6 @@ static void panfrost_reset(struct panfrost_device *pfdev,
> >  			pfdev->jobs[i] = NULL;
> >  		}
> >  	}
> > -	spin_unlock(&pfdev->js->job_lock);
> >  
> >  	panfrost_device_reset(pfdev);
> >  
> > @@ -504,6 +512,7 @@ static void panfrost_job_handle_irq(struct panfrost_device *pfdev, u32 status)
> >  
> >  			job = pfdev->jobs[j];
> >  			/* Only NULL if job timeout occurred */
> > +			WARN_ON(!job);  
> 
> Was this WARN_ON intentional?

Yes, now that we mask and synchronize the irq in the reset I don't see
any reason why we would end up with an event but no job to attach this
even to, but maybe I missed something.