On Tue, 28 Nov 2023 16:10:45 +0100 AngeloGioacchino Del Regno <angelogioacchino.delregno@xxxxxxxxxxxxx> wrote: > >> static void panfrost_job_handle_err(struct panfrost_device *pfdev, > >> struct panfrost_job *job, > >> unsigned int js) > >> @@ -792,9 +800,13 @@ static irqreturn_t panfrost_job_irq_handler_thread(int irq, void *data) > >> struct panfrost_device *pfdev = data; > >> > >> panfrost_job_handle_irqs(pfdev); > >> - job_write(pfdev, JOB_INT_MASK, > >> - GENMASK(16 + NUM_JOB_SLOTS - 1, 16) | > >> - GENMASK(NUM_JOB_SLOTS - 1, 0)); > >> + > >> + /* Enable interrupts only if we're not about to get suspended */ > >> + if (!test_bit(PANFROST_COMP_BIT_JOB, pfdev->is_suspending)) > > > > The irq-line is requested with IRQF_SHARED, meaning the line might be > > shared between all three GPU IRQs, but also with other devices. I think > > if we want to be totally safe, we need to also check this is_suspending > > field in the hard irq handlers before accessing the xxx_INT_yyy > > registers. > > > > This would mean that we would have to force canceling jobs in the suspend > handler, but if the IRQ never fired, would we still be able to find the > right bits flipped in JOB_INT_RAWSTAT? There should be no jobs left if we enter suspend. If there is, that's a bug we should fix, but I'm digressing. > > From what I understand, are you suggesting to call, in job_suspend_irq() > something like > > void panfrost_job_suspend_irq(struct panfrost_device *pfdev) > { > u32 status; > > set_bit(PANFROST_COMP_BIT_JOB, pfdev->is_suspending); > > job_write(pfdev, JOB_INT_MASK, 0); > synchronize_irq(pfdev->js->irq); > > status = job_read(pfdev, JOB_INT_STAT); I guess you meant _RAWSTAT. _STAT should always be zero after we've written 0 to _INT_MASK. > if (status) > panfrost_job_irq_handler_thread(pfdev->js->irq, (void*)pfdev); Nope, we don't need to read the STAT reg and forcibly call the threaded handler if it's != 0. The synchronize_irq() call should do exactly that (make sure all pending interrupts are processed before returning), and our previous job_write(pfdev, JOB_INT_MASK, 0) guarantees that no new interrupts will kick in after that point. > } > > and then while still retaining the check in the IRQ thread handler, also > check it in the hardirq handler like > > static irqreturn_t panfrost_job_irq_handler(int irq, void *data) > { > struct panfrost_device *pfdev = data; > u32 status; > > if (!test_bit(PANFROST_COMP_BIT_JOB, pfdev->is_suspending)) > return IRQ_NONE; Yes, that's the extra check I was talking about, and that's also the very reason I'm suggesting to call this field suspended_irqs instead of is_suspending. Ultimately, each bit in this bitmap encodes the status of a specific IRQ, not the transition from active-to-suspended, otherwise we'd be clearing the bit at the end of panfrost_job_suspend_irq(), right after the synchronize_irq(). But if we were doing that, our hard IRQ handler could be called because other devices raised an interrupt on the very same IRQ line while we are suspended, and we'd be doing an invalid GPU reg read while the clks/power-domains are off. > > status = job_read(pfdev, JOB_INT_STAT); > if (!status) > return IRQ_NONE; > > job_write(pfdev, JOB_INT_MASK, 0); > return IRQ_WAKE_THREAD; > } > > (rinse and repeat for panfrost_mmu) > > ..or am I misunderstanding you? > > Cheers, > Angelo > >