Sv: MCP251xFD runs interrupt handler twice per message.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On 14.01.2022 13:43:42, Markus Mirevik wrote:
> > > > I have performance problems with mcp2518fd. I have custom board
> > > > based of beagleboard black. (Sitara am335x) using two mcp2518fd.
> > > > The can communication uses a lot of CPU. irq/64-spi1.1 uses around
> > > > ~20% at
> > > > 1000 can messages/s.
> > > >
> > > > I have several questions but I'll start with the most confusing. I
> > > > have found that every can messages fires two interrupts on my board.
> > >
> > > Do you mean every RX'ed CAN frame?
> > Yes, every received CAN frame.
> >
> > > Which interrupts does increase twice?
> >
> > This one:
> > 64:  4  44e07000.gpio  22 Level     spi1.1
> 
> Ok, that's the IRQ line from the mcp251xfd.

ACK.

> 
> > > > I have tested this in several ways. If I send one normal can
> > > > message the counter in /proc/interrupts is increased twice. I have
> > > > also put some printouts in mcp251xfd-core.c that confirms that
> > > > mcp251xfd_irq() is run twice and the second time intf_pending is 0.
> > >
> > > You mean mcp251xfd_irq() is started twice?
> >
> > Yes.
> 
> That's not the way it's supposed to be. Maybe it's a limitation of the am335x
> IRQ controller.

What kind of limitation could that be? 

> 
> > > The big loop
> > > (https://elixir.bootlin.com/linux/v5.16/source/drivers/net/can/spi/m
> > > cp251xf
> > > d/mcp251xfd-core.c#L2182)
> > > in mcp251xfd_irq() is run twice, and the IRQ handler is left only if
> > > there are no pending IRQs.
> > >
> > > The main IRQ handler loop (omitting the rx-int) is straight forward,
> > > and not
> > > optimized:
> > > - read pending IRQs [*]
> > > - exit if there are no pending IRQs
> > > - handle pending IRQs
> > >   for RX:
> > >   - read RX-FIFO level [*]
> > >   - read RX'ed CAN frames [*]
> > >   - mark RX'ed CAN frames as read [*]
> > > - loop
> > >
> > > All actions marked with a [*] require a SPI transfer, which results
> > > in 5 SPI transfers (read pending IRQs is done twice) per RX'ed CAN frame.
> > > (Assuming there is only 1 RX'ed frame pending and no pending IRQs
> > > after the first loop.)
> > >
> > > I have some ideas how to optimize this:
> > > - do a single SPI transfer instead of several small ones:
> > >   e.g. pending IRQs + RX-FIFO level, or read RX'ed frames + mark as
> > > read
> > > - opportunistically assume RX'ed frame on interrupt and get rid of 1st
> > >   read pending IRQs
> > > - assume TX done IRQ, if CAN frames have been TX'ed but not marked as
> > >   sent hardware
> > > - coalesce RX IRQs
> > > - coalesce TX done IRQs
> > > - combine several TX frames into single SPI transfer
> > >
> >
> > Indeed there is a lot to work on, I'm a bit out of my depth here and
> > as I said the first question is about why it interrupt is fired twice.
> >
> > It's not only the big loop that is run twice. With rx-int active what
> > I can observe is this:
> >
> > - read RX-FIFO level [*]
> > - read RX'ed CAN frames [*]
> > - mark RX'ed CAN frames as read [*]
> > - read pending IRQs [*]
> > - exit if there are no pending IRQs
> >  -> No Pending IRQ's Exiting.
> 
> So far that look correct, but the extra call to mcp251xfd_irq() that follow is
> not correct.
> 
> > - read pending IRQs [*]
> > - exit if there are no pending IRQs
> > -> No Pending IRQ's Exiting.
> 
> > > > And for testing purposes I changed the interrupt config to trigger
> > > > on falling edge instead of level and with this configured there is
> > > > only one interrupt fired. But I guess this will risk locking the
> > > > interrupt line low if an edge isn't detected.
> > >
> > > ACK - Please don't use edge triggered IRQs, sooner or later the
> > > driver will miss an IRQ resulting in a handing driver. Always use
> > > level triggered with the mcp251xfd.
> > >
> >
> > Yes, I figured that.
> >
> > > > I have tried both with rx-int active and inactive. My scope shows
> > > > really nice signals and that rx-int and int is deactivated
> > > > simultaneous on incoming frames.
> > >
> > > rx-int is an optimization to skip the first get IRQ pending
> > > transfer, as it indicates RX'ed CAN frames.
> >
> > OFFTOPIC:
> > Shouldn’t a return after the rx-int part be ok to skip the next get IRQ
> pending transfer as well?
> 
> That's a possible optimization.
> 
> I would look at the number of pending TX CAN frames. The higher the
> number, the higher the chance that some have been send.
> 
> > if (priv->rx_int) {
> >   do {
> >     int rx_pending;
> >     rx_pending = gpiod_get_value_cansleep(priv->rx_int);
> >
> >     if (!rx_pending)
> >       break;
> >
> >     err = mcp251xfd_handle(priv, rxif);
> >     if (err)
> >       goto out_fail;
> >
> >     handled = IRQ_HANDLED;
> >   } while (1);
> >
> >   if (handled == IRQ_HANDLED) {
> >     return handled;
> >   }
> > }
> > ...
> >
> > Even better would be to check if (IRQ.VALUE == INACTIVE) but I don
> > know how to that..
> 
> I think this is not possible in the Linux kernel anymore. There was a function
> to get the GPIO associated with an IRQ, but that's not available anymore. In
> general IRQs are not associated to GPIO, that's probably only the case for
> SoCs, but not for other busses like PCI.
> 

OK.


> [...]
> 
> > > Can you test again with IRQ_TYPE_LEVEL_LOW and no rx-int-gpios.
> > > Please instrument the beginning and the returns of the
> > > mcp251xfd_irq() function to check if it's really started twice.
> > > Please print the
> > > priv->regs_status.intf in every loop. Can you record the gpio2_5
> > > priv->with a
> > > scope.
> >
> > Yes I can but what do you mean with Instrument?
> 
> That's a fancy way of saying: please add printk() :)
> 
> > And no scope until Monday.
> 
> no problem
> 
> > > Make sure that you don't send any CAN frames as reaction of reception.
> >
> > How do I make sure of that? :/
> 
> The Linux kernel doesn't send any CAN frames on its own. You need a user
> space program to do so. Make sure you don't have any user space program
> running that's sending CAN frames.
> 

No user space program running that would send can frames. And no can frames received on the other side.  

With IRQ_TYPE_LEVEL_LOW and the rx-int-gpios disabled I have added a call to netdev_info at the top of the function: 
...
        netdev_info(priv->ndev, "mcp251xfd_irq mcp251xfd_regs_status.intf: %d", priv->regs_status.intf );
        if (priv->rx_int) {
...

and here:
... 
               if (!(intf_pending)) {
                        netdev_info(priv->ndev, "!intf_pending mcp251xfd_regs_status.intf: %d", priv->regs_status.intf );
                        return handled;
                }
                netdev_info(priv->ndev, "intf_pending mcp251xfd_regs_status.intf: %d", priv->regs_status.intf );
...

Resulting in this when receiving one can message: 

[  987.486691] mcp251xfd spi1.1 can1: mcp251xfd_irq mcp251xfd_regs_status.intf: -1088815100
[  987.494987] mcp251xfd spi1.1 can1: intf_pending mcp251xfd_regs_status.intf: -1088815098
[  987.503239] mcp251xfd spi1.1 can1: !intf_pending mcp251xfd_regs_status.intf: -1088815100
[  987.511423] mcp251xfd spi1.1 can1: mcp251xfd_irq mcp251xfd_regs_status.intf: -1088815100
[  987.519603] mcp251xfd spi1.1 can1: !intf_pending mcp251xfd_regs_status.intf: -1088815100

mcp251xfd_regs_status.intf translated to binary 
1	1111 1111 1111 1111 1111 1111 1111 1111 1011 1111 0001 1010 0000 0000 0000 0100
2	1111 1111 1111 1111 1111 1111 1111 1111 1011 1111 0001 1010 0000 0000 0000 0110
3	1111 1111 1111 1111 1111 1111 1111 1111 1011 1111 0001 1010 0000 0000 0000 0100
4	1111 1111 1111 1111 1111 1111 1111 1111 1011 1111 0001 1010 0000 0000 0000 0100
5	1111 1111 1111 1111 1111 1111 1111 1111 1011 1111 0001 1010 0000 0000 0000 0100


I have also attached a  scope picture, this is without the printouts. 

And if I'm reading it correctly it confirms that it runs twice. 

1 read pending IRQs [*]
-RXIF-
2 read RX-FIFO level [*]
3 read RX'ed CAN frames [*]
4 and 5 mark RX'ed CAN frames as read [*] (This is the 5.10.79 version that uses regmap_update_bits which I assume uses one read and one write instruction.) 
-loop-
6 read pending IRQs [*]
-Exit-
7 read pending IRQs [*]

Confirmd with printouts:
[   26.990324] mcp251xfd spi1.1 can1: mcp251xfd_irq()
[   26.995173] mcp251xfd spi1.1 can1: * Bulk read MCP251XFD_REG_INT
[   27.001371] mcp251xfd spi1.1 can1: Interrupts pending, handling.
[   27.007435] mcp251xfd spi1.1 can1: * Read RX MCP251XFD_REG_FIFOSTA
[   27.013676] mcp251xfd spi1.1 can1: * Bulk read. Register: 1248 count: 5
[   27.020226] mcp251xfd spi1.1 can1: * Update bits MCP251XFD_REG_FIFOCON_UINC
[   27.027318] mcp251xfd spi1.1 can1: * Bulk read MCP251XFD_REG_INT
[   27.033378] mcp251xfd spi1.1 can1: No interrupts pending, returning.
[   27.039805] mcp251xfd spi1.1 can1: mcp251xfd_irq()
[   27.044617] mcp251xfd spi1.1 can1: * Bulk read MCP251XFD_REG_INT
[   27.050693] mcp251xfd spi1.1 can1: No interrupts pending, returning.

Regards
Markus 

Attachment: scope.png
Description: scope.png


[Index of Archives]     [Automotive Discussions]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [CAN Bus]

  Powered by Linux