Ah okay, go figure. There was an updated patch series emailed between before I finished writing this email. So please ignore... On Mon, Oct 21, 2024 at 10:46 AM Jon Cormier <jcormier@xxxxxxxxxxxxxxxx> wrote: > > Adding the e2e thread that has instigated this change. > > https://e2e.ti.com/support/processors-group/processors/f/processors-forum/1394222/am625-issue-about-tidss-rcu_preempt-self-detected-stall-on-cpu?pifragment-323307=1#pifragment-323307=2 > > Summary of original problem: An AM62x device using the TIDSS driver, > can lock up after hours of running. The lock ups are often detected > by the rcu_preempt system. The lock ups turned out to be caused by an > infinite interrupt loop (irq storm?) in the TIDSS_DISPC driver. > > The k3_clear_irqstatus function which is responsible for clearing the > interrupt bits, only clear the the level 1 interrupts if the level 2 > ones are set. This leaves a small window where if for whatever reason > the level 2 interrupts aren't set but the level 1's are, then we will > never clear the level 1 interrupt. > > The change as submitted is not sufficient to prevent the irq storm. > I've tested these two patches for several weeks now and they reduce > the frequency of the irq storm from once a day to once every few days, > but don't prevent it. > > I suggest that the k3_clear_irqstatus function needs to be updated > such that it's not possible for the level 1 DISPC_IRQSTATUS bit to > remain uncleared. > > The following hack proposed by Bin and team removes the possibility of > the irq storm happening, while introducing a small chance of clearing > interrupts that weren't intended. Though I would assume that if the > level 2 interrupts aren't cleared, they would reassert the level 1 > DISPC_IRQSTATUS so maybe that's not much of a risk. Most other > drivers when clearing interrupts do a read and then write to clear > interrupts so there is precedence. > > diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c > b/drivers/gpu/drm/tidss/tidss_dispc.c > index 60f69be36692..0b8a3d999c54 100644 > --- a/drivers/gpu/drm/tidss/tidss_dispc.c > +++ b/drivers/gpu/drm/tidss/tidss_dispc.c > @@ -900,27 +900,27 @@ static > void dispc_k3_clear_irqstatus(struct dispc_device *dispc, dispc_irq_t > clearmask) > { > unsigned int i; > - u32 top_clear = 0; > > for (i = 0; i < dispc->feat->num_vps; ++i) { > if (clearmask & DSS_IRQ_VP_MASK(i)) { > dispc_k3_vp_write_irqstatus(dispc, i, clearmask); > - top_clear |= BIT(i); > } > } > + > for (i = 0; i < dispc->feat->num_planes; ++i) { > if (clearmask & DSS_IRQ_PLANE_MASK(i)) { > dispc_k3_vid_write_irqstatus(dispc, i, clearmask); > - top_clear |= BIT(4 + i); > } > } > + > if (dispc->feat->subrev == DISPC_K2G) > return; > > - dispc_write(dispc, DISPC_IRQSTATUS, top_clear); > - > - /* Flush posted writes */ > - dispc_read(dispc, DISPC_IRQSTATUS); > + /* Always clear the level 1 irqstatus (DISPC_IRQSTATUS) unconditionally > Note I'm not sure we are confident in the reasoning outlined in this comment > + * due to an IP bug where level 1 irq status (DISPC_IRQSTATUS) > would get set delayed even > + * after level 2 interrupt (DISPC_VID_IRQSTATUS, > DISPC_VP_IRQSTATUS) is cleared. > + */ > + dispc_write(dispc, DISPC_IRQSTATUS, dispc_read(dispc, DISPC_IRQSTATUS)); > > I had proposed a more complete version of this patch but there hasn't > been much discussion about it and I've mostly tested Bins version. > > } > > > On Mon, Oct 21, 2024 at 7:15 AM Tomi Valkeinen > <tomi.valkeinen@xxxxxxxxxxxxxxxx> wrote: > > > > Hi, > > > > On 12/10/2024 18:07, Devarsh Thakkar wrote: > > > It is possible that dispc_{k2g/k3}_set_irqenable can be called for > > > disabling some interrupt events which were previously enabled. However > > > instead of clearing any pending events for the interrupt events that are > > > required to be disabled, it was instead clearing the new interrupt events > > > which were not even enabled. > > > > That's on purpose. When we enable a new interrupt, we want to first > > clear the irqstatus for that interrupt to make sure there's no old > > status left lying around. If I'm not mistaken, enabling an interrupt > > with an irqstatus bit set will immediately trigger the interrupt. > > > > > For e.g. While disabling the vsync events, dispc_k3_set_irqenable tries to > > > clear DSS_IRQ_DEVICE_OCP_ERR which was not enabled per the old_mask at all > > > as shown below : > > > > > > "dispc_k3_set_irqenable : irqenabled - mask = 91, old = f0, clr = 1" where > > > clr = (mask ^ old_mask) & old_mask > > > > That's a bit odd... Why was the DSS_IRQ_DEVICE_OCP_ERR not already > > enabled? It is enabled in the tidss_irq_install(). > > > > Or maybe it had been enabled by the driver, but as the HW doesn't > > support that bit, it reads always as 0. I have an unsent patch to drop > > DSS_IRQ_DEVICE_OCP_ERR. > > > > > This corrects the bit mask to make sure that it always clears any pending > > > interrupt events that are requested to be disabled before disabling them > > > actually. > > > > I think the point here makes sense: if we disable interrupts with > > dispc_set_irqenable(), we don't want to see interrupt handling for the > > disabled interrupts after the call. > > > > However, if you clear the irqstatus for an interrupt that will be > > disabled, but clear it _before_ disabling the interrupt, the interrupt > > might trigger right after clearing the irqstatus but before disabling it. > > > > Tomi > > > > > Fixes: 32a1795f57ee ("drm/tidss: New driver for TI Keystone platform Display SubSystem") > > > Reported-by: Jonathan Cormier <jcormier@xxxxxxxxxxxxxxxx> > > > Signed-off-by: Devarsh Thakkar <devarsht@xxxxxx> > > > --- > > > drivers/gpu/drm/tidss/tidss_dispc.c | 8 ++++---- > > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/gpu/drm/tidss/tidss_dispc.c b/drivers/gpu/drm/tidss/tidss_dispc.c > > > index 1ad711f8d2a8..b04419b24863 100644 > > > --- a/drivers/gpu/drm/tidss/tidss_dispc.c > > > +++ b/drivers/gpu/drm/tidss/tidss_dispc.c > > > @@ -700,8 +700,8 @@ void dispc_k2g_set_irqenable(struct dispc_device *dispc, dispc_irq_t mask) > > > { > > > dispc_irq_t old_mask = dispc_k2g_read_irqenable(dispc); > > > > > > - /* clear the irqstatus for newly enabled irqs */ > > > - dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & mask); > > > + /* clear the irqstatus for irqs that are being disabled now */ > > > + dispc_k2g_clear_irqstatus(dispc, (mask ^ old_mask) & old_mask); > > > > > > dispc_k2g_vp_set_irqenable(dispc, 0, mask); > > > dispc_k2g_vid_set_irqenable(dispc, 0, mask); > > > @@ -843,8 +843,8 @@ static void dispc_k3_set_irqenable(struct dispc_device *dispc, > > > > > > old_mask = dispc_k3_read_irqenable(dispc); > > > > > > - /* clear the irqstatus for newly enabled irqs */ > > > - dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & mask); > > > + /* clear the irqstatus for irqs that are being disabled now */ > > > + dispc_k3_clear_irqstatus(dispc, (old_mask ^ mask) & old_mask); > > > > > > for (i = 0; i < dispc->feat->num_vps; ++i) { > > > dispc_k3_vp_set_irqenable(dispc, i, mask); > > > > > -- > Jonathan Cormier > Software Engineer > > Voice: 315.425.4045 x222 > > > > http://www.CriticalLink.com > 6712 Brooklawn Parkway, Syracuse, NY 13211 -- Jonathan Cormier Software Engineer Voice: 315.425.4045 x222 http://www.CriticalLink.com 6712 Brooklawn Parkway, Syracuse, NY 13211