Re: [PATCH v2 2/3] drm/panthor: Fix ordering in _irq_suspend()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 26 Mar 2024 09:58:17 +0000
Liviu Dudau <liviu.dudau@xxxxxxx> wrote:

> On Mon, Mar 25, 2024 at 07:02:13PM +0100, Boris Brezillon wrote:
> > On Mon, 25 Mar 2024 17:16:16 +0000
> > Liviu Dudau <liviu.dudau@xxxxxxx> wrote:
> >   
> > > On Mon, Mar 25, 2024 at 02:57:04PM +0100, Boris Brezillon wrote:  
> > > > Make sure we set suspended=true last to avoid generating an irq storm
> > > > in the unlikely case where an IRQ happens between the suspended=true
> > > > assignment and the _INT_MASK update.
> > > > 
> > > > v2:
> > > > - New patch
> > > > 
> > > > Reported-by: Steven Price <steven.price@xxxxxxx>
> > > > Signed-off-by: Boris Brezillon <boris.brezillon@xxxxxxxxxxxxx>
> > > > ---
> > > >  drivers/gpu/drm/panthor/panthor_device.h | 4 ++--
> > > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > > 
> > > > diff --git a/drivers/gpu/drm/panthor/panthor_device.h b/drivers/gpu/drm/panthor/panthor_device.h
> > > > index 7ee4987a3796..3a930a368ae1 100644
> > > > --- a/drivers/gpu/drm/panthor/panthor_device.h
> > > > +++ b/drivers/gpu/drm/panthor/panthor_device.h
> > > > @@ -325,7 +325,7 @@ static inline void panthor_ ## __name ## _irq_suspend(struct panthor_irq *pirq)
> > > >  {												\
> > > >  	int cookie;										\
> > > >  												\
> > > > -	atomic_set(&pirq->suspended, true);							\
> > > > +	pirq->mask = 0;										\    
> > > 
> > > I think you might still have a race between _irq_suspend() and _irq_threaded_handler() where the
> > > status will be zero due to pirq->mask being zero, so no interrupt will be cleared but they will
> > > be masked (kind of the opposite problem to patch 3/3).  
> > 
> > Right, but I'm trying to find a case where this is an issue. Yes, we
> > might lose events, but at the same time, when _irq_suspend() is called,
> > we are supposed to be idle, so all this mask=0 assignment does is
> > speed-up the synchronization with the irq-thread. If there's anything
> > we need to be done before suspending the IRQ, this should really use
> > its own synchronization model.  
> 
> Perf counter collection before going idle ;)

That's typically a case of synchronization that should be done
explicitly rather than relying on the synchronize_irq() we have in
_irq_suspend().

> 
> I know that's not yet on your radar, but we need to keep it in mind to
> test that scenario when we add support for keeping runtime PM alive
> during perf counters dumping.

It's definitely on our radar, but I don't think _irq_suspend() is the
right way to handle perfcnt dump synchronization. IMHO, it deserves its
own _perfcnt_suspend() function that's called before _fw_suspend() to
make sure any in-flight perfcnt dumps get flushed before the
JOB/FW irq suspend happens.

> 
> >   
> > > 
> > > I'm starting to think that pirq->mask should be local to _irq_threaded_handler() and not be messed
> > > with in the other functions.  
> > 
> > It kinda is, as we don't modify panthor_irq::mask outside the
> > suspend/resume (and now unplug) path, and each of these accesses has a
> > reason to exist:
> > 
> > - in the resume path, we know all IRQs are masked, and we reset the
> >   SW-side mask to the interrupts we want to accept before updating
> >   _INT_MASK. No risk of race in that one
> > - in the unplug path, I don't think we care about unhandled interrupts,
> >   because the device will become unusable after that point, so updating
> >   the panthor_irq::mask early and losing events should be okay.
> > - the suspend case has been described above. As explained, I don't think
> >   it matters if we lose events there, because really, if there's any
> >   synchronization needed, it should have happened explicitly before
> >   _irq_suspend() is called. The synchronize_irq() we have is just here
> >   to make sure there's nothing accessing registers when we turn the
> >   device clk/power-domain off.
> >   
> > >   
> > > >  												\
> > > >  	if (drm_dev_enter(&pirq->ptdev->base, &cookie)) {					\
> > > >  		gpu_write(pirq->ptdev, __reg_prefix ## _INT_MASK, 0);				\    
> > > 
> > > If you move the line above before the if condition, do you still need patch 3/3?  
> > 
> > The whole point of the drm_dev_enter/exit() section was to prevent
> > access to registers after the device has been unplugged, so, if I move
> > the gpu_write() outside of this block, I'd rather drop the entire
> > drm_dev_enter/exit() section (both here and in _irq_resume()). That
> > should be safe actually, as I don't expect the PM hooks or the reset
> > handler to be called after the device and its resource have been
> > removed, and those are the two only paths where _irq_suspend/resume()
> > can be called.  
> 
> Agree. It feels strange to care a lot about preventing access to registers
> (as we should) and then going and adding the unplug function which replaces
> the suspend in 3 out of 7 cases. New contributors might get confused about
> which one to use in the future.

Actually, irq_unplug() should be used in the _unplug() path, as the
name implies, I don't think that's confusing :P. My point was more on
the fact we probably don't need the dev_enter/exit() sections in the
first place, so why introduce a new helper if we can make the
_irq_suspend() helper simpler and re-usable for the _unplug() case?

> 
> What we should do is rename suspend to unplug and move the drm_dev_enter/exit()
> condition at the calling point for the cases where it makes sense.

That'd be even more confusing IMO, why would you call an helper named
_unplug() if all you want to do is temporarily suspend the IRQ?



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux