Re: [PATCH v2 3/3] drm/panfrost: Synchronize and disable interrupts before powering off

Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> · Tue, 28 Nov 2023 17:41:49 +0100

On Tue, 28 Nov 2023 17:10:17 +0100
AngeloGioacchino Del Regno <angelogioacchino.delregno@xxxxxxxxxxxxx>
wrote:

> >> 	if (status)
> >> 		panfrost_job_irq_handler_thread(pfdev->js->irq, (void*)pfdev);  
> > 
> > Nope, we don't need to read the STAT reg and forcibly call the threaded
> > handler if it's != 0. The synchronize_irq() call should do exactly that
> > (make sure all pending interrupts are processed before returning), and
> > our previous job_write(pfdev, JOB_INT_MASK, 0) guarantees that no new
> > interrupts will kick in after that point.
> >   
> 
> Unless we synchronize_irq() *before* masking all interrupts (which would be
> wrong, as some interrupt could still fire after execution of the ISR), we get
> *either of* two scenarios:
> 
>   - COMP_BIT_JOB is not set, softirq thread unmasks some interrupts by
>     writing to JOB_INT_MASK; or
>   - COMP_BIT_JOB is set, hardirq handler returns IRQ_NONE, the threaded
>     interrupt handler doesn't get executed, jobs are not canceled.
> 
> So if we don't forbicly call the threaded handler if RAWSTAT != 0 in there,
> and if the extra check is present in the hardirq handler, and if the hardirq
> handler wasn't executed already before our synchronize_irq() call (so: if the
> hardirq execution has to be done to synchronize irqs), we are not guaranteeing
> that jobs cancellation/dequeuing/removal/whatever-handling is done before
> entering suspend.

Except the job event processing should have happened before we reached
that point. panfrost_xxx_suspend_irq() are just here to make sure

- we're done processing pending IRQs that we started processing before
  the _INT_MASK update happened
- we ignore new ones, if any

If we end up with unprocessed JOB/MMU irqs we care about when we're
here, this should be fixed by:

1. Making sure the paths updating the MMU AS are retaining a runtime PM
  ref (pm_runtime_get_sync()) before doing their stuff, and releasing
  it (pm_runtime_put()) when they are done

2. Making sure we retain a runtime PM ref while we have jobs queued to
   the various JM queues

3. Making sure we acquire a runtime PM ref when we are about to push a
   job to one of the JM queue

For #2 and #3, we retain one runtime PM ref per active job, just before
queuing it [1], and release the ref when the job is completed [2][3].
We're not supposed to receive interrupts if we have no active jobs, and
if we do, we can safely ignore them, because there's not much we would
do with those anyway.

For #1, we retain the runtime PM ref when flushing TLBs of an
active AS, and when destroying an active MMU context. The last
operation that requires touching GPU regs is panfrost_mmu_enable(),
which is called from panfrost_mmu_as_get(), which is turn is called
from panfrost_job_hw_submit() after this function has acquired a
runtime PM ref. All MMU updates are synchronous, and the interrupts
that might result from an AS are caused by GPU jobs. Meaning that any
MMU interrupt remaining when we're in the suspend path can safely be
ignored.

> 
> That, unless the suggestion was to call panfrost_job_handle_irqs() instead of
> the handler thread like that (because reading it back, it makes sense to do so).

Nope, the suggestion was to keep things unchanged in
panfrost_job_suspend_irq(), and just add the extra is_suspended check
in panfrost_job_irq_handler().

[1]https://elixir.bootlin.com/linux/v6.7-rc3/source/drivers/gpu/drm/panfrost/panfrost_job.c#L207
[2]https://elixir.bootlin.com/linux/v6.7-rc3/source/drivers/gpu/drm/panfrost/panfrost_job.c#L462
[3]https://elixir.bootlin.com/linux/v6.7-rc3/source/drivers/gpu/drm/panfrost/panfrost_job.c#L481
[4]https://elixir.bootlin.com/linux/v6.7-rc3/source/drivers/gpu/drm/panfrost/panfrost_mmu.c#L279
[5]https://elixir.bootlin.com/linux/v6.7-rc3/source/drivers/gpu/drm/panfrost/panfrost_mmu.c#L555