Re: [PATCH v2 2/3] drm/panthor: Fix ordering in _irq_suspend()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 25 Mar 2024 17:16:16 +0000
Liviu Dudau <liviu.dudau@xxxxxxx> wrote:

> On Mon, Mar 25, 2024 at 02:57:04PM +0100, Boris Brezillon wrote:
> > Make sure we set suspended=true last to avoid generating an irq storm
> > in the unlikely case where an IRQ happens between the suspended=true
> > assignment and the _INT_MASK update.
> > 
> > v2:
> > - New patch
> > 
> > Reported-by: Steven Price <steven.price@xxxxxxx>
> > Signed-off-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx>
> > ---
> >  drivers/gpu/drm/panthor/panthor_device.h | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> > index 7ee4987a3796..3a930a368ae1 100644
> > --- a/drivers/gpu/drm/panthor/panthor_device.h
> > +++ b/drivers/gpu/drm/panthor/panthor_device.h
> > @@ -325,7 +325,7 @@ static inline void panthor_ ## __name ## _irq_suspend(struct panthor_irq *pirq)
> >  {												\
> >  	int cookie;										\
> >  												\
> > -	atomic_set(&pirq->suspended, true);							\
> > +	pirq->mask = 0;										\  
> 
> I think you might still have a race between _irq_suspend() and _irq_threaded_handler() where the
> status will be zero due to pirq->mask being zero, so no interrupt will be cleared but they will
> be masked (kind of the opposite problem to patch 3/3).

Right, but I'm trying to find a case where this is an issue. Yes, we
might lose events, but at the same time, when _irq_suspend() is called,
we are supposed to be idle, so all this mask=0 assignment does is
speed-up the synchronization with the irq-thread. If there's anything
we need to be done before suspending the IRQ, this should really use
its own synchronization model.

> 
> I'm starting to think that pirq->mask should be local to _irq_threaded_handler() and not be messed
> with in the other functions.

It kinda is, as we don't modify panthor_irq::mask outside the
suspend/resume (and now unplug) path, and each of these accesses has a
reason to exist:

- in the resume path, we know all IRQs are masked, and we reset the
  SW-side mask to the interrupts we want to accept before updating
  _INT_MASK. No risk of race in that one
- in the unplug path, I don't think we care about unhandled interrupts,
  because the device will become unusable after that point, so updating
  the panthor_irq::mask early and losing events should be okay.
- the suspend case has been described above. As explained, I don't think
  it matters if we lose events there, because really, if there's any
  synchronization needed, it should have happened explicitly before
  _irq_suspend() is called. The synchronize_irq() we have is just here
  to make sure there's nothing accessing registers when we turn the
  device clk/power-domain off.

> 
> >  												\
> >  	if (drm_dev_enter(&pirq->ptdev->base, &cookie)) {					\
> >  		gpu_write(pirq->ptdev, __reg_prefix ## _INT_MASK, 0);				\  
> 
> If you move the line above before the if condition, do you still need patch 3/3?

The whole point of the drm_dev_enter/exit() section was to prevent
access to registers after the device has been unplugged, so, if I move
the gpu_write() outside of this block, I'd rather drop the entire
drm_dev_enter/exit() section (both here and in _irq_resume()). That
should be safe actually, as I don't expect the PM hooks or the reset
handler to be called after the device and its resource have been
removed, and those are the two only paths where _irq_suspend/resume()
can be called.



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux