On 25/06/2021 17:02, Boris Brezillon wrote: > On Fri, 25 Jun 2021 16:55:12 +0100 > Steven Price <steven.price@xxxxxxx> wrote: > >> On 25/06/2021 14:33, Boris Brezillon wrote: >>> This is not yet needed because we let active jobs be killed during by >>> the reset and we don't really bother making sure they can be restarted. >>> But once we start adding soft-stop support, controlling when we deal >>> with the remaining interrrupts and making sure those are handled before >>> the reset is issued gets tricky if we keep job interrupts active. >>> >>> Let's prepare for that and mask+flush job IRQs before issuing a reset. >>> >>> Signed-off-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> >>> --- >>> drivers/gpu/drm/panfrost/panfrost_job.c | 21 +++++++++++++++------ >>> 1 file changed, 15 insertions(+), 6 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/panfrost/panfrost_job.c b/drivers/gpu/drm/panfrost/panfrost_job.c >>> index 88d34fd781e8..0566e2f7e84a 100644 >>> --- a/drivers/gpu/drm/panfrost/panfrost_job.c >>> +++ b/drivers/gpu/drm/panfrost/panfrost_job.c >>> @@ -34,6 +34,7 @@ struct panfrost_queue_state { >>> struct panfrost_job_slot { >>> struct panfrost_queue_state queue[NUM_JOB_SLOTS]; >>> spinlock_t job_lock; >>> + int irq; >>> }; >>> >>> static struct panfrost_job * >>> @@ -400,7 +401,15 @@ static void panfrost_reset(struct panfrost_device *pfdev, >>> if (bad) >>> drm_sched_increase_karma(bad); >>> >>> - spin_lock(&pfdev->js->job_lock); >> >> I'm not sure it's safe to remove this lock as this protects the >> pfdev->jobs array: I can't see what would prevent panfrost_job_close() >> running at the same time without the lock. Am I missing something? > > Ah, you're right, I'll add it back. > >> >>> + /* Mask job interrupts and synchronize to make sure we won't be >>> + * interrupted during our reset. >>> + */ >>> + job_write(pfdev, JOB_INT_MASK, 0); >>> + synchronize_irq(pfdev->js->irq); >>> + >>> + /* Schedulers are stopped and interrupts are masked+flushed, we don't >>> + * need to protect the 'evict unfinished jobs' lock with the job_lock. >>> + */ >>> for (i = 0; i < NUM_JOB_SLOTS; i++) { >>> if (pfdev->jobs[i]) { >>> pm_runtime_put_noidle(pfdev->dev); >>> @@ -408,7 +417,6 @@ static void panfrost_reset(struct panfrost_device *pfdev, >>> pfdev->jobs[i] = NULL; >>> } >>> } >>> - spin_unlock(&pfdev->js->job_lock); >>> >>> panfrost_device_reset(pfdev); >>> >>> @@ -504,6 +512,7 @@ static void panfrost_job_handle_irq(struct panfrost_device *pfdev, u32 status) >>> >>> job = pfdev->jobs[j]; >>> /* Only NULL if job timeout occurred */ >>> + WARN_ON(!job); >> >> Was this WARN_ON intentional? > > Yes, now that we mask and synchronize the irq in the reset I don't see > any reason why we would end up with an event but no job to attach this > even to, but maybe I missed something. > Ok - but I guess the comment above needs updating then! ;) Job timeouts are still a thing which definitely can happen! Steve