Re: [PATCH v2 3/3] drm/panfrost: Synchronize and disable interrupts before powering off

Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx> · Tue, 28 Nov 2023 16:53:57 +0100

On Tue, 28 Nov 2023 16:10:45 +0100
AngeloGioacchino Del Regno <angelogioacchino.delregno@xxxxxxxxxxxxx>
wrote:

> >>   static void panfrost_job_handle_err(struct panfrost_device *pfdev,
> >>   				    struct panfrost_job *job,
> >>   				    unsigned int js)
> >> @@ -792,9 +800,13 @@ static irqreturn_t panfrost_job_irq_handler_thread(int irq, void *data)
> >>   	struct panfrost_device *pfdev = data;
> >>   
> >>   	panfrost_job_handle_irqs(pfdev);
> >> -	job_write(pfdev, JOB_INT_MASK,
> >> -		  GENMASK(16 + NUM_JOB_SLOTS - 1, 16) |
> >> -		  GENMASK(NUM_JOB_SLOTS - 1, 0));
> >> +
> >> +	/* Enable interrupts only if we're not about to get suspended */
> >> +	if (!test_bit(PANFROST_COMP_BIT_JOB, pfdev->is_suspending))  
> > 
> > The irq-line is requested with IRQF_SHARED, meaning the line might be
> > shared between all three GPU IRQs, but also with other devices. I think
> > if we want to be totally safe, we need to also check this is_suspending
> > field in the hard irq handlers before accessing the xxx_INT_yyy
> > registers.
> >   
> 
> This would mean that we would have to force canceling jobs in the suspend
> handler, but if the IRQ never fired, would we still be able to find the
> right bits flipped in JOB_INT_RAWSTAT?

There should be no jobs left if we enter suspend. If there is, that's a
bug we should fix, but I'm digressing.

> 
>  From what I understand, are you suggesting to call, in job_suspend_irq()
> something like
> 
> void panfrost_job_suspend_irq(struct panfrost_device *pfdev)
> {
>          u32 status;
> 
> 	set_bit(PANFROST_COMP_BIT_JOB, pfdev->is_suspending);
> 
> 	job_write(pfdev, JOB_INT_MASK, 0);
> 	synchronize_irq(pfdev->js->irq);
> 
> 	status = job_read(pfdev, JOB_INT_STAT);

I guess you meant _RAWSTAT. _STAT should always be zero after we've
written 0 to _INT_MASK.

> 	if (status)
> 		panfrost_job_irq_handler_thread(pfdev->js->irq, (void*)pfdev);

Nope, we don't need to read the STAT reg and forcibly call the threaded
handler if it's != 0. The synchronize_irq() call should do exactly that
(make sure all pending interrupts are processed before returning), and
our previous job_write(pfdev, JOB_INT_MASK, 0) guarantees that no new
interrupts will kick in after that point.

> }
> 
> and then while still retaining the check in the IRQ thread handler, also
> check it in the hardirq handler like
> 
> static irqreturn_t panfrost_job_irq_handler(int irq, void *data)
> {
> 	struct panfrost_device *pfdev = data;
> 	u32 status;
> 
> 	if (!test_bit(PANFROST_COMP_BIT_JOB, pfdev->is_suspending))
> 		return IRQ_NONE;

Yes, that's the extra check I was talking about, and that's also the
very reason I'm suggesting to call this field suspended_irqs instead of
is_suspending. Ultimately, each bit in this bitmap encodes the status
of a specific IRQ, not the transition from active-to-suspended,
otherwise we'd be clearing the bit at the end of
panfrost_job_suspend_irq(), right after the synchronize_irq(). But if
we were doing that, our hard IRQ handler could be called because other
devices raised an interrupt on the very same IRQ line while we are
suspended, and we'd be doing an invalid GPU reg read while the
clks/power-domains are off.

> 
> 	status = job_read(pfdev, JOB_INT_STAT);
> 	if (!status)
> 	        return IRQ_NONE;
> 
> 	job_write(pfdev, JOB_INT_MASK, 0);
> 	return IRQ_WAKE_THREAD;
> }
> 
> (rinse and repeat for panfrost_mmu)
> 
> ..or am I misunderstanding you?
> 
> Cheers,
> Angelo
> 
>