Re: [PATCH 1/2] drm/tidss: Clear the interrupt status for interrupts being disabled

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ah okay, go figure.  There was an updated patch series emailed between
before I finished writing this email. So please ignore...


On Mon, Oct 21, 2024 at 10:46 AM Jon Cormier <jcormier@xxxxxxxxxxxxxxxx> wrote:
>
> Adding the e2e thread that has instigated this change.
>
> https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1394222/am625-issue-about-tidss-rcu_preempt-self-detected-stall-on-cpu?pifragment-323307=1#pifragment-323307=2
>
> Summary of original problem: An AM62x device using the TIDSS driver,
> can lock up after hours of running.  The lock ups are often detected
> by the rcu_preempt system.  The lock ups turned out to be caused by an
> infinite interrupt loop (irq storm?) in the TIDSS_DISPC driver.
>
> The k3_clear_irqstatus function which is responsible for clearing the
> interrupt bits, only clear the the level 1 interrupts if the level 2
> ones are set.  This leaves a small window where if for whatever reason
> the level 2 interrupts aren't set but the level 1's are, then we will
> never clear the level 1 interrupt.
>
> The change as submitted is not sufficient to prevent the irq storm.
> I've tested these two patches for several weeks now and they reduce
> the frequency of the irq storm from once a day to once every few days,
> but don't prevent it.
>
> I suggest that the k3_clear_irqstatus function needs to be updated
> such that it's not possible for the level 1 DISPC_IRQSTATUS bit to
> remain uncleared.
>
> The following hack proposed by Bin and team removes the possibility of
> the irq storm happening, while introducing a small chance of clearing
> interrupts that weren't intended.  Though I would assume that if the
> level 2 interrupts aren't cleared, they would reassert the level 1
> DISPC_IRQSTATUS so maybe that's not much of a risk.  Most other
> drivers when clearing interrupts do a read and then write to clear
> interrupts so there is precedence.
>
> diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c
> b/drivers/gpu/drm/tidss/tidss_dispc.c
> index 60f69be36692..0b8a3d999c54 100644
> --- a/drivers/gpu/drm/tidss/tidss_dispc.c
> +++ b/drivers/gpu/drm/tidss/tidss_dispc.c
> @@ -900,27 +900,27 @@ static
>  void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t
> clearmask)
>  {
>      unsigned int i;
> -    u32 top_clear = 0;
>
>      for (i = 0; i < dispc->feat->num_vps; ++i) {
>          if (clearmask & DSS_IRQ_VP_MASK(i)) {
>              dispc_k3_vp_write_irqstatus(dispc, i, clearmask);
> -            top_clear |= BIT(i);
>          }
>      }
> +
>      for (i = 0; i < dispc->feat->num_planes; ++i) {
>          if (clearmask & DSS_IRQ_PLANE_MASK(i)) {
>              dispc_k3_vid_write_irqstatus(dispc, i, clearmask);
> -            top_clear |= BIT(4 + i);
>          }
>      }
> +
>      if (dispc->feat->subrev == DISPC_K2G)
>          return;
>
> -    dispc_write(dispc, DISPC_IRQSTATUS, top_clear);
> -
> -    /* Flush posted writes */
> -    dispc_read(dispc, DISPC_IRQSTATUS);
> +    /* Always clear the level 1 irqstatus (DISPC_IRQSTATUS) unconditionally
> Note I'm not sure we are confident in the reasoning outlined in this comment
> +     * due to an IP bug where level 1 irq status (DISPC_IRQSTATUS)
> would get set delayed even
> +     * after level 2 interrupt (DISPC_VID_IRQSTATUS,
> DISPC_VP_IRQSTATUS) is cleared.
> +     */
> +    dispc_write(dispc, DISPC_IRQSTATUS, dispc_read(dispc, DISPC_IRQSTATUS));
>
> I had proposed a more complete version of this patch but there hasn't
> been much discussion about it and I've mostly tested Bins version.
>
>  }
>
>
> On Mon, Oct 21, 2024 at 7:15 AM Tomi Valkeinen
> <tomi.valkeinen@xxxxxxxxxxxxxxxx> wrote:
> >
> > Hi,
> >
> > On 12/10/2024 18:07, Devarsh Thakkar wrote:
> > > It is possible that dispc_{k2g/k3}_set_irqenable can be called for
> > > disabling some interrupt events which were previously enabled. However
> > > instead of clearing any pending events for the interrupt events that are
> > > required to be disabled, it was instead clearing the new interrupt events
> > > which were not even enabled.
> >
> > That's on purpose. When we enable a new interrupt, we want to first
> > clear the irqstatus for that interrupt to make sure there's no old
> > status left lying around. If I'm not mistaken, enabling an interrupt
> > with an irqstatus bit set will immediately trigger the interrupt.
> >
> > > For e.g. While disabling the vsync events, dispc_k3_set_irqenable tries to
> > > clear DSS_IRQ_DEVICE_OCP_ERR which was not enabled per the old_mask at all
> > > as shown below :
> > >
> > > "dispc_k3_set_irqenable : irqenabled - mask = 91, old = f0, clr = 1" where
> > > clr = (mask ^ old_mask) & old_mask
> >
> > That's a bit odd... Why was the DSS_IRQ_DEVICE_OCP_ERR not already
> > enabled? It is enabled in the tidss_irq_install().
> >
> > Or maybe it had been enabled by the driver, but as the HW doesn't
> > support that bit, it reads always as 0. I have an unsent patch to drop
> > DSS_IRQ_DEVICE_OCP_ERR.
> >
> > > This corrects the bit mask to make sure that it always clears any pending
> > > interrupt events that are requested to be disabled before disabling them
> > > actually.
> >
> > I think the point here makes sense: if we disable interrupts with
> > dispc_set_irqenable(), we don't want to see interrupt handling for the
> > disabled interrupts after the call.
> >
> > However, if you clear the irqstatus for an interrupt that will be
> > disabled, but clear it _before_ disabling the interrupt, the interrupt
> > might trigger right after clearing the irqstatus but before disabling it.
> >
> >   Tomi
> >
> > > Fixes: 32a1795f57ee ("drm/tidss: New driver for TI Keystone platform Display SubSystem")
> > > Reported-by: Jonathan Cormier <jcormier@xxxxxxxxxxxxxxxx>
> > > Signed-off-by: Devarsh Thakkar <devarsht@xxxxxx>
> > > ---
> > >   drivers/gpu/drm/tidss/tidss_dispc.c | 8 ++++----
> > >   1 file changed, 4 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c
> > > index 1ad711f8d2a8..b04419b24863 100644
> > > --- a/drivers/gpu/drm/tidss/tidss_dispc.c
> > > +++ b/drivers/gpu/drm/tidss/tidss_dispc.c
> > > @@ -700,8 +700,8 @@ void dispc_k2g_set_irqenable(struct dispc_device *dispc, dispc_irq_t mask)
> > >   {
> > >       dispc_irq_t old_mask = dispc_k2g_read_irqenable(dispc);
> > >
> > > -     /* clear the irqstatus for newly enabled irqs */
> > > -     dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & mask);
> > > +     /* clear the irqstatus for irqs that are being disabled now */
> > > +     dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & old_mask);
> > >
> > >       dispc_k2g_vp_set_irqenable(dispc, 0, mask);
> > >       dispc_k2g_vid_set_irqenable(dispc, 0, mask);
> > > @@ -843,8 +843,8 @@ static void dispc_k3_set_irqenable(struct dispc_device *dispc,
> > >
> > >       old_mask = dispc_k3_read_irqenable(dispc);
> > >
> > > -     /* clear the irqstatus for newly enabled irqs */
> > > -     dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & mask);
> > > +     /* clear the irqstatus for irqs that are being disabled now */
> > > +     dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & old_mask);
> > >
> > >       for (i = 0; i < dispc->feat->num_vps; ++i) {
> > >               dispc_k3_vp_set_irqenable(dispc, i, mask);
> >
>
>
> --
> Jonathan Cormier
> Software Engineer
>
> Voice:  315.425.4045 x222
>
>
>
> http://www.CriticalLink.com
> 6712 Brooklawn Parkway, Syracuse, NY 13211



-- 
Jonathan Cormier
Software Engineer

Voice:  315.425.4045 x222



http://www.CriticalLink.com
6712 Brooklawn Parkway, Syracuse, NY 13211




[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux