Sv: MCP251xFD runs interrupt handler twice per message.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi

> On 14.01.2022 11:05:51, Markus Mirevik wrote:
> > I have performance problems with mcp2518fd. I have custom board based
> > of beagleboard black. (Sitara am335x) using two mcp2518fd. The can
> > communication uses a lot of CPU. irq/64-spi1.1 uses around ~20% at
> > 1000 can messages/s.
> >
> > I have several questions but I'll start with the most confusing. I
> > have found that every can messages fires two interrupts on my board.
> 
> Do you mean every RX'ed CAN frame?
Yes, every received CAN frame.

> Which interrupts does increase twice?

This one: 
64:  4  44e07000.gpio  22 Level     spi1.1

 
> > I have tested this in several ways. If I send one normal can message
> > the counter in /proc/interrupts is increased twice. I have also put
> > some printouts in mcp251xfd-core.c that confirms that mcp251xfd_irq()
> > is run twice and the second time intf_pending is 0.
> 
> You mean mcp251xfd_irq() is started twice?

Yes.

> 
> The big loop
> (https://elixir.bootlin.com/linux/v5.16/source/drivers/net/can/spi/mcp251xf
> d/mcp251xfd-core.c#L2182)
> in mcp251xfd_irq() is run twice, and the IRQ handler is left only if there are
> no pending IRQs.
> 
> The main IRQ handler loop (omitting the rx-int) is straight forward, and not
> optimized:
> - read pending IRQs [*]
> - exit if there are no pending IRQs
> - handle pending IRQs
>   for RX:
>   - read RX-FIFO level [*]
>   - read RX'ed CAN frames [*]
>   - mark RX'ed CAN frames as read [*]
> - loop
> 
> All actions marked with a [*] require a SPI transfer, which results in 5 SPI
> transfers (read pending IRQs is done twice) per RX'ed CAN frame.
> (Assuming there is only 1 RX'ed frame pending and no pending IRQs after
> the first loop.)
> 
> I have some ideas how to optimize this:
> - do a single SPI transfer instead of several small ones:
>   e.g. pending IRQs + RX-FIFO level, or read RX'ed frames + mark as read
> - opportunistically assume RX'ed frame on interrupt and get rid of 1st
>   read pending IRQs
> - assume TX done IRQ, if CAN frames have been TX'ed but not marked as
>   sent hardware
> - coalesce RX IRQs
> - coalesce TX done IRQs
> - combine several TX frames into single SPI transfer
> 

Indeed there is a lot to work on, I'm a bit out of my depth here and as I said the first question is about why it interrupt is fired twice. 
It's not only the big loop that is run twice. 
With rx-int active what I can observe is this:

- read RX-FIFO level [*]
- read RX'ed CAN frames [*]
- mark RX'ed CAN frames as read [*]
- read pending IRQs [*]
- exit if there are no pending IRQs
 -> No Pending IRQ's Exiting. 
 - read pending IRQs [*]
 - exit if there are no pending IRQs
-> No Pending IRQ's Exiting.

> > And for testing purposes I changed the interrupt config to trigger on
> > falling edge instead of level and with this configured there is only
> > one interrupt fired. But I guess this will risk locking the interrupt
> > line low if an edge isn't detected.
> 
> ACK - Please don't use edge triggered IRQs, sooner or later the driver will
> miss an IRQ resulting in a handing driver. Always use level triggered with the
> mcp251xfd.
> 

Yes, I figured that. 

> > I have tried both with rx-int active and inactive. My scope shows
> > really nice signals and that rx-int and int is deactivated
> > simultaneous on incoming frames.
> 
> rx-int is an optimization to skip the first get IRQ pending transfer, as it
> indicates RX'ed CAN frames.

OFFTOPIC:
Shouldn’t a return after the rx-int part be ok to skip the next get IRQ pending transfer as well? 

if (priv->rx_int) {
  do {
    int rx_pending;
    rx_pending = gpiod_get_value_cansleep(priv->rx_int);
    
    if (!rx_pending)
      break;

    err = mcp251xfd_handle(priv, rxif);
    if (err)
      goto out_fail;

    handled = IRQ_HANDLED;
  } while (1);
  
  if (handled == IRQ_HANDLED) {
    return handled;
  }
}
...

Even better would be to check if  (IRQ.VALUE == INACTIVE) but I don know how to that.. 


> 
> > I'm testing by sending messages from my computer through a can dongle.
> >
> > The board is currently running Linux 5.10.79-yocto-tiny #1 SMP Tue Nov
> > 16 03:57:43 UTC 2021 armv7l armv7l armv7l GNU/Linux
> 
> Newer kernels include some optimizations....
> 
> > I've also tested updating the driver to the one from
> > https://github.com/marckleinebudde/linux.git from 2021-12-29 with the
> > same result (IRQ handler run twice).
> 
> The newest production ready code is always net-next/master. But you can
> use the latest official kernel release (v5.16) too.
> 
> > I am profoundly confused by this issue and I can not figure out why
> > it's happening. My understanding is that since the IRQ handler is
> > loaded with IRQF_ONESHOT the irq line should be masked until
> > mcp251xfd_irq() returns.
> 
> ACK.
> 
> > Since mcp251xfd_irq() checks that !rx_pending before exiting the int
> > signal must be high. And the interrupt should not fire again...
> 
> The rx-int GPIO:
> 
> | microchip,rx-int-gpios = <&gpio1 8 GPIO_ACTIVE_LOW>;
> 
> is not used as an interrupt. The only interrupt line is:
> 
> | interrupts-extended = <&gpio2 5 IRQ_TYPE_EDGE_FALLING>;
> 
> If there is an interrupt the gpio2_5 will get active, if this is an RX IRQ the
> gpio1_8 will get active, too. But only gpio2_5 will trigger the interrupt
> handler.
> 
> In the interrupt handler, as rx-int is available and active the RX handling is
> done first. If there are no more RX IRQs pending, the rx-int goes inactive and
> the big loop in the IRQ handler will be handled: read rx-pending IRQs,
> probably none pending, then leave IRQ handler.
> 

Ok.

> > Result from init:
> > [    4.003029] mcp251xfd spi1.0 can0: MCP2518FD rev0.0 (+RX_INT -
> MAB_NO_WARN +CRC_REG +CRC_RX +CRC_TX +ECC -HD c:40.00MHz
> m:20.00MHz r:17.00MHz e:0.00MHz) successfully initialized.
> > [    4.028220] mcp251xfd spi1.1 can1: MCP2518FD rev0.0 (-RX_INT -
> MAB_NO_WARN +CRC_REG +CRC_RX +CRC_TX +ECC -HD c:40.00MHz
> m:20.00MHz r:17.00MHz e:0.00MHz) successfully initialized.
> >
> > This is the current dtsi part for the canFD chips. (With rx-pin removed on
> one of the devices and trigger on edge on the other for debugging
> purposes).
> >
> > #include <dt-bindings/gpio/gpio.h>
> > #include <dt-bindings/interrupt-controller/irq.h>
> >
> > &am33xx_pinmux{
> >         pinctrl_spi1_pins: pinctrl_spi1_pins {
> >                 pinctrl-single,pins = <
> >                         AM33XX_IOPAD(0x990, PIN_INPUT | MUX_MODE3) /* (A13)
> mcasp0_aclkx.spi1_sclk */
> >                         AM33XX_IOPAD(0x994, PIN_INPUT | MUX_MODE3) /* (B13)
> mcasp0_fsx.spi1_d0 */
> >                         AM33XX_IOPAD(0x998, PIN_INPUT | MUX_MODE3) /* (D12)
> mcasp0_axr0.spi1_d1 */
> >                         AM33XX_IOPAD(0x96c, PIN_OUTPUT_PULLUP | MUX_MODE5)
> /* (E17) uart0_rtsn.spi1_cs0         CleANopen       LEFT*/
> >                         AM33XX_IOPAD(0x9b0, PIN_OUTPUT_PULLUP |
> MUX_MODE4) /* (A15) xdma_event_intr0.spi1_cs1   SAWM CAN
> RIGHT*/
> >                 >;
> >         };
> >
> >         can0_int_pins: can0_int_pins {
> >                 pinctrl-single,pins = <
> >                 /*CleANopen*/
> >                 AM33XX_IOPAD(0x89c, PIN_INPUT_PULLUP | MUX_MODE7) /*
> (T6) gpmc_be0n_cle.gpio2[5]        nINT            */
> >                 AM33XX_IOPAD(0x968, PIN_INPUT_PULLUP | MUX_MODE7) /*
> (E18) uart0_ctsn.gpio1[8]          nINT1           */
> >                 >;
> >         };
> >
> >         can1_int_pins: can1_int_pins {
> >                 pinctrl-single,pins = <
> >                 /*SAWM CAN*/
> >                 AM33XX_IOPAD(0x820, PIN_INPUT_PULLUP | MUX_MODE7) /*
> (U10) gpmc_ad8.gpio0[22]           nINT            */
> >                 AM33XX_IOPAD(0x8c8, PIN_INPUT_PULLUP | MUX_MODE7) /*
> (U3) lcd_data10.gpio2[16]  nINT1           */
> >                 AM33XX_IOPAD(0x8cc, PIN_INPUT_PULLUP | MUX_MODE7) /*
> (U4) lcd_data11.gpio2[17]  nINT0 NOT USED  */
> >                 >;
> >         };
> > };
> >
> >
> >
> > /{
> >         /* external 40M oscillator of mcp2518fd on SPI1.0 */
> >         mcp2518fd_can0_osc: mcp2518fd_can0_osc {
> >                 compatible = "fixed-clock";
> >                 #clock-cells = <0>;
> >                 clock-frequency = <40000000>;
> >         };
> > };
> >
> >
> > /{
> >         /* external 40M oscillator of mcp2518fd on SPI1.1 */
> >         mcp2518fd_can1_osc: mcp2518fd_can1_osc {
> >                 compatible = "fixed-clock";
> >                 #clock-cells = <0>;
> >                 clock-frequency = <40000000>;
> >         };
> > };
> >
> > /* the spi config of the can-controller itself binding everything
> > together */ &spi1{
> >     #address-cells = <1>;
> >     #size-cells = <0>;
> >
> >     status = "okay";
> >     pinctrl-names = "default";
> >     pinctrl-0 = <&pinctrl_spi1_pins>;
> >
> >     /*CleANopen*/
> >     can@0 {
> >         compatible = "microchip,mcp2518fd";
> >         reg = <0>;
> >         clocks = <&mcp2518fd_can0_osc>;
> >         pinctrl-names = "default";
> >         pinctrl-0 = <&can0_int_pins>;
> >         spi-max-frequency = <20000000>;
> >         interrupts-extended = <&gpio2 5 IRQ_TYPE_EDGE_FALLING>;
> 
> See comment about edge IRQs above!

ACK.

> 
> >         microchip,rx-int-gpios = <&gpio1 8 GPIO_ACTIVE_LOW>;
> >     };
> >
> >     can@1 {
> >         compatible = "microchip,mcp2518fd";
> >         reg = <1>;
> >         clocks = <&mcp2518fd_can1_osc>;
> >         pinctrl-names = "default";
> >         pinctrl-0 = <&can1_int_pins>;
> >         spi-max-frequency = <20000000>;
> >         interrupts-extended = <&gpio0 22 IRQ_TYPE_LEVEL_LOW>;
> > //        microchip,rx-int-gpios = <&gpio2 16 GPIO_ACTIVE_LOW>;
> >     };
> > };
> 
> Can you test again with IRQ_TYPE_LEVEL_LOW and no rx-int-gpios. Please
> instrument the beginning and the returns of the mcp251xfd_irq() function to
> check if it's really started twice. Please print the
> priv->regs_status.intf in every loop. Can you record the gpio2_5 with a
> scope.

Yes I can but what do you mean with Instrument? 
And no scope until Monday. 


> Make sure that you don't send any CAN frames as reaction of reception.

How do I make sure of that? :/ 


> 
> regards,
> Marc
> 
> --
> Pengutronix e.K.                 | Marc Kleine-Budde           |
> Embedded Linux                   | https://www.pengutronix.de  |
> Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
> Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

Thank you for your time
Markus 




[Index of Archives]     [Automotive Discussions]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [CAN Bus]

  Powered by Linux