Re: [PATCH v3 10/15] drm/panfrost: Make sure job interrupts are masked before resetting

Steven Price <steven.price@xxxxxxx> · Fri, 25 Jun 2021 17:11:31 +0100

On 25/06/2021 17:02, Boris Brezillon wrote:
> On Fri, 25 Jun 2021 16:55:12 +0100
> Steven Price <steven.price@xxxxxxx> wrote:
> 
>> On 25/06/2021 14:33, Boris Brezillon wrote:
>>> This is not yet needed because we let active jobs be killed during by
>>> the reset and we don't really bother making sure they can be restarted.
>>> But once we start adding soft-stop support, controlling when we deal
>>> with the remaining interrrupts and making sure those are handled before
>>> the reset is issued gets tricky if we keep job interrupts active.
>>>
>>> Let's prepare for that and mask+flush job IRQs before issuing a reset.
>>>
>>> Signed-off-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx>
>>> ---
>>>  drivers/gpu/drm/panfrost/panfrost_job.c | 21 +++++++++++++++------
>>>  1 file changed, 15 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c
>>> index 88d34fd781e8..0566e2f7e84a 100644
>>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c
>>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c
>>> @@ -34,6 +34,7 @@ struct panfrost_queue_state {
>>>  struct panfrost_job_slot {
>>>  	struct panfrost_queue_state queue[NUM_JOB_SLOTS];
>>>  	spinlock_t job_lock;
>>> +	int irq;
>>>  };
>>>  
>>>  static struct panfrost_job *
>>> @@ -400,7 +401,15 @@ static void panfrost_reset(struct panfrost_device *pfdev,
>>>  	if (bad)
>>>  		drm_sched_increase_karma(bad);
>>>  
>>> -	spin_lock(&pfdev->js->job_lock);  
>>
>> I'm not sure it's safe to remove this lock as this protects the
>> pfdev->jobs array: I can't see what would prevent panfrost_job_close()
>> running at the same time without the lock. Am I missing something?
> 
> Ah, you're right, I'll add it back.
> 
>>
>>> +	/* Mask job interrupts and synchronize to make sure we won't be
>>> +	 * interrupted during our reset.
>>> +	 */
>>> +	job_write(pfdev, JOB_INT_MASK, 0);
>>> +	synchronize_irq(pfdev->js->irq);
>>> +
>>> +	/* Schedulers are stopped and interrupts are masked+flushed, we don't
>>> +	 * need to protect the 'evict unfinished jobs' lock with the job_lock.
>>> +	 */
>>>  	for (i = 0; i < NUM_JOB_SLOTS; i++) {
>>>  		if (pfdev->jobs[i]) {
>>>  			pm_runtime_put_noidle(pfdev->dev);
>>> @@ -408,7 +417,6 @@ static void panfrost_reset(struct panfrost_device *pfdev,
>>>  			pfdev->jobs[i] = NULL;
>>>  		}
>>>  	}
>>> -	spin_unlock(&pfdev->js->job_lock);
>>>  
>>>  	panfrost_device_reset(pfdev);
>>>  
>>> @@ -504,6 +512,7 @@ static void panfrost_job_handle_irq(struct panfrost_device *pfdev, u32 status)
>>>  
>>>  			job = pfdev->jobs[j];
>>>  			/* Only NULL if job timeout occurred */
>>> +			WARN_ON(!job);  
>>
>> Was this WARN_ON intentional?
> 
> Yes, now that we mask and synchronize the irq in the reset I don't see
> any reason why we would end up with an event but no job to attach this
> even to, but maybe I missed something.
> 

Ok - but I guess the comment above needs updating then! ;) Job timeouts
are still a thing which definitely can happen!

Steve