On Tue, 28 Nov 2023 17:10:17 +0100 AngeloGioacchino Del Regno <angelogioacchino.delregno@xxxxxxxxxxxxx> wrote: > >> if (status) > >> panfrost_job_irq_handler_thread(pfdev->js->irq, (void*)pfdev); > > > > Nope, we don't need to read the STAT reg and forcibly call the threaded > > handler if it's != 0. The synchronize_irq() call should do exactly that > > (make sure all pending interrupts are processed before returning), and > > our previous job_write(pfdev, JOB_INT_MASK, 0) guarantees that no new > > interrupts will kick in after that point. > > > > Unless we synchronize_irq() *before* masking all interrupts (which would be > wrong, as some interrupt could still fire after execution of the ISR), we get > *either of* two scenarios: > > - COMP_BIT_JOB is not set, softirq thread unmasks some interrupts by > writing to JOB_INT_MASK; or > - COMP_BIT_JOB is set, hardirq handler returns IRQ_NONE, the threaded > interrupt handler doesn't get executed, jobs are not canceled. > > So if we don't forbicly call the threaded handler if RAWSTAT != 0 in there, > and if the extra check is present in the hardirq handler, and if the hardirq > handler wasn't executed already before our synchronize_irq() call (so: if the > hardirq execution has to be done to synchronize irqs), we are not guaranteeing > that jobs cancellation/dequeuing/removal/whatever-handling is done before > entering suspend. Except the job event processing should have happened before we reached that point. panfrost_xxx_suspend_irq() are just here to make sure - we're done processing pending IRQs that we started processing before the _INT_MASK update happened - we ignore new ones, if any If we end up with unprocessed JOB/MMU irqs we care about when we're here, this should be fixed by: 1. Making sure the paths updating the MMU AS are retaining a runtime PM ref (pm_runtime_get_sync()) before doing their stuff, and releasing it (pm_runtime_put()) when they are done 2. Making sure we retain a runtime PM ref while we have jobs queued to the various JM queues 3. Making sure we acquire a runtime PM ref when we are about to push a job to one of the JM queue For #2 and #3, we retain one runtime PM ref per active job, just before queuing it [1], and release the ref when the job is completed [2][3]. We're not supposed to receive interrupts if we have no active jobs, and if we do, we can safely ignore them, because there's not much we would do with those anyway. For #1, we retain the runtime PM ref when flushing TLBs of an active AS, and when destroying an active MMU context. The last operation that requires touching GPU regs is panfrost_mmu_enable(), which is called from panfrost_mmu_as_get(), which is turn is called from panfrost_job_hw_submit() after this function has acquired a runtime PM ref. All MMU updates are synchronous, and the interrupts that might result from an AS are caused by GPU jobs. Meaning that any MMU interrupt remaining when we're in the suspend path can safely be ignored. > > That, unless the suggestion was to call panfrost_job_handle_irqs() instead of > the handler thread like that (because reading it back, it makes sense to do so). Nope, the suggestion was to keep things unchanged in panfrost_job_suspend_irq(), and just add the extra is_suspended check in panfrost_job_irq_handler(). [1]https://elixir.bootlin.com/linux/v6.7-rc3/source/drivers/gpu/drm/panfrost/panfrost_job.c#L207 [2]https://elixir.bootlin.com/linux/v6.7-rc3/source/drivers/gpu/drm/panfrost/panfrost_job.c#L462 [3]https://elixir.bootlin.com/linux/v6.7-rc3/source/drivers/gpu/drm/panfrost/panfrost_job.c#L481 [4]https://elixir.bootlin.com/linux/v6.7-rc3/source/drivers/gpu/drm/panfrost/panfrost_mmu.c#L279 [5]https://elixir.bootlin.com/linux/v6.7-rc3/source/drivers/gpu/drm/panfrost/panfrost_mmu.c#L555