RE: [PATCH 1/3] can: rcar_canfd: Fix IRQ storm on global fifo receive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Marc,

> Subject: RE: [PATCH 1/3] can: rcar_canfd: Fix IRQ storm on global fifo
> receive
> 
> Hi Marc,
> 
> > Subject: Re: [PATCH 1/3] can: rcar_canfd: Fix IRQ storm on global
> fifo
> > receive
> >
> > On 24.10.2022 16:55:56, Biju Das wrote:
> > > Hi Marc,
> > > > Subject: Re: [PATCH 1/3] can: rcar_canfd: Fix IRQ storm on
> global
> > > > fifo receive
> > > >
> > > > On 24.10.2022 17:37:35, Marc Kleine-Budde wrote:
> > > > > On 22.10.2022 09:15:01, Biju Das wrote:
> > > > > > We are seeing IRQ storm on global receive IRQ line under
> heavy
> > > > > > CAN bus load conditions with both CAN channels are enabled.
> > > > > >
> > > > > > Conditions:
> > > > > >   The global receive IRQ line is shared between can0 and
> can1,
> > > > either
> > > > > >   of the channels can trigger interrupt while the other
> > channel
> > > > irq
> > > > > >   line is disabled(rfie).
> > > > > >   When global receive IRQ interrupt occurs, we mask the
> > > > > > interrupt
> > > > in
> > > > > >   irqhandler. Clearing and unmasking of the interrupt is
> > > > > > happening
> > > > in
> > > > > >   rx_poll(). There is a race condition where rx_poll unmask
> > the
> > > > > >   interrupt, but the next irq handler does not mask the irq
> > due to
> > > > > >   NAPIF_STATE_MISSED flag.
> > > > >
> > > > > Why does this happen? Is it a problem that you call
> > > > > rcar_canfd_handle_global_receive() for a channel that has the
> > IRQs
> > > > > actually disabled in hardware?
> > > >
> > > > Can you check if the IRQ is active _and_ enabled before handling
> > the
> > > > IRQ on a particular channel?
> > >
> > > You mean IRQ handler or rx_poll()??
> >
> > I mean the IRQ handler.
> >
> > Consider the IRQ for channel0 is disabled but active and the IRQ for
> > channel1 is enabled and active. The
> > rcar_canfd_global_receive_fifo_interrupt() will iterate over both
> > channels, and rcar_canfd_handle_global_receive() will serve the
> > channel0 IRQ, even if the IRQ is _not_ enabled. So I suggested to
> only
> > handle a channel's RX IRQ if that IRQ is actually enabled.
> >
> > Assuming "cc & RCANFD_RFCC_RFI" checks if IRQ is enabled:
> 
> 
> >
> > index 567620d215f8..ea828c1bd3a1 100644
> > --- a/drivers/net/can/rcar/rcar_canfd.c
> > +++ b/drivers/net/can/rcar/rcar_canfd.c
> > @@ -1157,11 +1157,13 @@ static void
> > rcar_canfd_handle_global_receive(struct rcar_canfd_global *gpriv, u3
> {
> >         struct rcar_canfd_channel *priv = gpriv->ch[ch];
> >         u32 ridx = ch + RCANFD_RFFIFO_IDX;
> > -       u32 sts;
> > +       u32 sts, cc;
> >
> >         /* Handle Rx interrupts */
> >         sts = rcar_canfd_read(priv->base, RCANFD_RFSTS(gpriv,
> ridx));
> > -       if (likely(sts & RCANFD_RFSTS_RFIF)) {
> > +       cc = rcar_canfd_read(priv->base, RCANFD_RFCC(gpriv, ridx));
> > +       if (likely(sts & RCANFD_RFSTS_RFIF &&
> > +                  cc & RCANFD_RFCC_RFIE)) {
> >                 if (napi_schedule_prep(&priv->napi)) {
> >                         /* Disable Rx FIFO interrupts */
> >                         rcar_canfd_clear_bit(priv->base,
> >
> > Please check if that fixes your issue.

Yes, it fixes the issue.

> 
> >
> > > IRQ handler check the status and disable(mask) the IRQ line.
> > > rx_poll() clears the status and enable(unmask) the IRQ line.
> > >
> > > Status flag is set by HW while line is in disabled/enabled state.
> > >
> > > Channel0 and channel1 has 2 IRQ lines within the IP which is ored
> > > together to provide global receive interrupt(shared line).
> >
> > > > A more clearer approach would be to get rid of the global
> > interrupt
> > > > handlers at all. If the hardware only given 1 IRQ line for more
> > than
> > > > 1 channel, the driver would register an IRQ handler for each
> > channel
> > > > (with the shared attribute). The IRQ handler must check, if the
> > IRQ
> > > > is
> >                      ^^^^^^^^^
> > That should be "flag".
> OK.
> 
> >
> > > > pending and enabled. If not return IRQ_NONE, otherwise handle
> and
> > > > return IRQ_HANDLED.
> > >
> > > That involves restructuring the IRQ handler altogether.
> >
> > ACK
> >
> > > RZ/G2L has shared line for rx fifos {ch0 and ch1} -> 2 IRQ routine
> > > with shared attributes.
> >
> > It's the same IRQ handler (or IRQ routine), but called 1x for each
> > channel, so 2x in total. The SHARED is actually a IRQ flag in the
> 4th
> > argument in the devm_request_irq() function.
> >
> > | devm_request_irq(..., ..., ..., IRQF_SHARED, ..., ...);
> >
> > > R-Car SoCs has shared line for rx fifos {ch0 and ch1} and error
> > > interrupts->3 IRQ routines with shared attributes.
> >
> > > R-CarV3U SoCs has shared line for rx fifos {ch0 to ch8} and error
> > > interrupts->9 IRQ routines with shared attributes.
> >
> > I think you got the point, I just wanted to point out the usual way
> > they are called.
> >
> > > Yes, I can send follow up patches for migrating to shared
> interrupt
> > > handlers as enhancement. Please let me know.
> >
> > Please check if my patch snippet from above works. To fix the IRQ
> > storm problem I'd like to have a simple and short solution that can
> go
> > into stable before restructuring the IRQ handlers.
> 
> OK, Tomorrow will provide you the feedback.

I will send V2 with these changes.

Cheers,
Biju




[Index of Archives]     [Linux Samsung SOC]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Samba]     [Device Mapper]

  Powered by Linux