Hi Marc, > Subject: RE: [PATCH 1/3] can: rcar_canfd: Fix IRQ storm on global fifo > receive > > Hi Marc, > > > Subject: Re: [PATCH 1/3] can: rcar_canfd: Fix IRQ storm on global > fifo > > receive > > > > On 24.10.2022 16:55:56, Biju Das wrote: > > > Hi Marc, > > > > Subject: Re: [PATCH 1/3] can: rcar_canfd: Fix IRQ storm on > global > > > > fifo receive > > > > > > > > On 24.10.2022 17:37:35, Marc Kleine-Budde wrote: > > > > > On 22.10.2022 09:15:01, Biju Das wrote: > > > > > > We are seeing IRQ storm on global receive IRQ line under > heavy > > > > > > CAN bus load conditions with both CAN channels are enabled. > > > > > > > > > > > > Conditions: > > > > > > The global receive IRQ line is shared between can0 and > can1, > > > > either > > > > > > of the channels can trigger interrupt while the other > > channel > > > > irq > > > > > > line is disabled(rfie). > > > > > > When global receive IRQ interrupt occurs, we mask the > > > > > > interrupt > > > > in > > > > > > irqhandler. Clearing and unmasking of the interrupt is > > > > > > happening > > > > in > > > > > > rx_poll(). There is a race condition where rx_poll unmask > > the > > > > > > interrupt, but the next irq handler does not mask the irq > > due to > > > > > > NAPIF_STATE_MISSED flag. > > > > > > > > > > Why does this happen? Is it a problem that you call > > > > > rcar_canfd_handle_global_receive() for a channel that has the > > IRQs > > > > > actually disabled in hardware? > > > > > > > > Can you check if the IRQ is active _and_ enabled before handling > > the > > > > IRQ on a particular channel? > > > > > > You mean IRQ handler or rx_poll()?? > > > > I mean the IRQ handler. > > > > Consider the IRQ for channel0 is disabled but active and the IRQ for > > channel1 is enabled and active. The > > rcar_canfd_global_receive_fifo_interrupt() will iterate over both > > channels, and rcar_canfd_handle_global_receive() will serve the > > channel0 IRQ, even if the IRQ is _not_ enabled. So I suggested to > only > > handle a channel's RX IRQ if that IRQ is actually enabled. > > > > Assuming "cc & RCANFD_RFCC_RFI" checks if IRQ is enabled: > > > > > > index 567620d215f8..ea828c1bd3a1 100644 > > --- a/drivers/net/can/rcar/rcar_canfd.c > > +++ b/drivers/net/can/rcar/rcar_canfd.c > > @@ -1157,11 +1157,13 @@ static void > > rcar_canfd_handle_global_receive(struct rcar_canfd_global *gpriv, u3 > { > > struct rcar_canfd_channel *priv = gpriv->ch[ch]; > > u32 ridx = ch + RCANFD_RFFIFO_IDX; > > - u32 sts; > > + u32 sts, cc; > > > > /* Handle Rx interrupts */ > > sts = rcar_canfd_read(priv->base, RCANFD_RFSTS(gpriv, > ridx)); > > - if (likely(sts & RCANFD_RFSTS_RFIF)) { > > + cc = rcar_canfd_read(priv->base, RCANFD_RFCC(gpriv, ridx)); > > + if (likely(sts & RCANFD_RFSTS_RFIF && > > + cc & RCANFD_RFCC_RFIE)) { > > if (napi_schedule_prep(&priv->napi)) { > > /* Disable Rx FIFO interrupts */ > > rcar_canfd_clear_bit(priv->base, > > > > Please check if that fixes your issue. Yes, it fixes the issue. > > > > > > IRQ handler check the status and disable(mask) the IRQ line. > > > rx_poll() clears the status and enable(unmask) the IRQ line. > > > > > > Status flag is set by HW while line is in disabled/enabled state. > > > > > > Channel0 and channel1 has 2 IRQ lines within the IP which is ored > > > together to provide global receive interrupt(shared line). > > > > > > A more clearer approach would be to get rid of the global > > interrupt > > > > handlers at all. If the hardware only given 1 IRQ line for more > > than > > > > 1 channel, the driver would register an IRQ handler for each > > channel > > > > (with the shared attribute). The IRQ handler must check, if the > > IRQ > > > > is > > ^^^^^^^^^ > > That should be "flag". > OK. > > > > > > > pending and enabled. If not return IRQ_NONE, otherwise handle > and > > > > return IRQ_HANDLED. > > > > > > That involves restructuring the IRQ handler altogether. > > > > ACK > > > > > RZ/G2L has shared line for rx fifos {ch0 and ch1} -> 2 IRQ routine > > > with shared attributes. > > > > It's the same IRQ handler (or IRQ routine), but called 1x for each > > channel, so 2x in total. The SHARED is actually a IRQ flag in the > 4th > > argument in the devm_request_irq() function. > > > > | devm_request_irq(..., ..., ..., IRQF_SHARED, ..., ...); > > > > > R-Car SoCs has shared line for rx fifos {ch0 and ch1} and error > > > interrupts->3 IRQ routines with shared attributes. > > > > > R-CarV3U SoCs has shared line for rx fifos {ch0 to ch8} and error > > > interrupts->9 IRQ routines with shared attributes. > > > > I think you got the point, I just wanted to point out the usual way > > they are called. > > > > > Yes, I can send follow up patches for migrating to shared > interrupt > > > handlers as enhancement. Please let me know. > > > > Please check if my patch snippet from above works. To fix the IRQ > > storm problem I'd like to have a simple and short solution that can > go > > into stable before restructuring the IRQ handlers. > > OK, Tomorrow will provide you the feedback. I will send V2 with these changes. Cheers, Biju