Re: MCP251xFD runs interrupt handler twice per message.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 14.01.2022 11:05:51, Markus Mirevik wrote:
> I have performance problems with mcp2518fd. I have custom board based
> of beagleboard black. (Sitara am335x) using two mcp2518fd. The can
> communication uses a lot of CPU. irq/64-spi1.1 uses around ~20% at
> 1000 can messages/s.
> 
> I have several questions but I'll start with the most confusing. I
> have found that every can messages fires two interrupts on my board.

Do you mean every RX'ed CAN frame?
Which interrupts does increase twice?

> I have tested this in several ways. If I send one normal can message
> the counter in /proc/interrupts is increased twice. I have also put
> some printouts in mcp251xfd-core.c that confirms that mcp251xfd_irq()
> is run twice and the second time intf_pending is 0.

You mean mcp251xfd_irq() is started twice?

The big loop
(https://elixir.bootlin.com/linux/v5.16/source/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c#L2182)
in mcp251xfd_irq() is run twice, and the IRQ handler is left only if
there are no pending IRQs.

The main IRQ handler loop (omitting the rx-int) is straight forward, and
not optimized:
- read pending IRQs [*]
- exit if there are no pending IRQs
- handle pending IRQs
  for RX:
  - read RX-FIFO level [*]
  - read RX'ed CAN frames [*]
  - mark RX'ed CAN frames as read [*]
- loop

All actions marked with a [*] require a SPI transfer, which results in 5
SPI transfers (read pending IRQs is done twice) per RX'ed CAN frame.
(Assuming there is only 1 RX'ed frame pending and no pending IRQs after
 the first loop.)

I have some ideas how to optimize this:
- do a single SPI transfer instead of several small ones:
  e.g. pending IRQs + RX-FIFO level, or read RX'ed frames + mark as read
- opportunistically assume RX'ed frame on interrupt and get rid of 1st
  read pending IRQs
- assume TX done IRQ, if CAN frames have been TX'ed but not marked as
  sent hardware
- coalesce RX IRQs
- coalesce TX done IRQs
- combine several TX frames into single SPI transfer

> And for testing purposes I changed the interrupt config to trigger on
> falling edge instead of level and with this configured there is only
> one interrupt fired. But I guess this will risk locking the interrupt
> line low if an edge isn't detected.

ACK - Please don't use edge triggered IRQs, sooner or later the driver
will miss an IRQ resulting in a handing driver. Always use level
triggered with the mcp251xfd.

> I have tried both with rx-int active and inactive. My scope shows
> really nice signals and that rx-int and int is deactivated
> simultaneous on incoming frames.

rx-int is an optimization to skip the first get IRQ pending transfer, as
it indicates RX'ed CAN frames.

> I'm testing by sending messages from my computer through a can dongle.
> 
> The board is currently running Linux 5.10.79-yocto-tiny #1 SMP Tue Nov
> 16 03:57:43 UTC 2021 armv7l armv7l armv7l GNU/Linux

Newer kernels include some optimizations....

> I've also tested updating the driver to the one from
> https://github.com/marckleinebudde/linux.git from 2021-12-29 with the
> same result (IRQ handler run twice).

The newest production ready code is always net-next/master. But you can
use the latest official kernel release (v5.16) too.

> I am profoundly confused by this issue and I can not figure out why
> it's happening. My understanding is that since the IRQ handler is
> loaded with IRQF_ONESHOT the irq line should be masked until
> mcp251xfd_irq() returns.

ACK.

> Since mcp251xfd_irq() checks that !rx_pending before exiting the int
> signal must be high. And the interrupt should not fire again...

The rx-int GPIO:

| microchip,rx-int-gpios = <&gpio1 8 GPIO_ACTIVE_LOW>;

is not used as an interrupt. The only interrupt line is:

| interrupts-extended = <&gpio2 5 IRQ_TYPE_EDGE_FALLING>;

If there is an interrupt the gpio2_5 will get active, if this is an RX
IRQ the gpio1_8 will get active, too. But only gpio2_5 will trigger the
interrupt handler.

In the interrupt handler, as rx-int is available and active the RX
handling is done first. If there are no more RX IRQs pending, the rx-int
goes inactive and the big loop in the IRQ handler will be handled: read
rx-pending IRQs, probably none pending, then leave IRQ handler.

> Result from init:
> [    4.003029] mcp251xfd spi1.0 can0: MCP2518FD rev0.0 (+RX_INT -MAB_NO_WARN +CRC_REG +CRC_RX +CRC_TX +ECC -HD c:40.00MHz m:20.00MHz r:17.00MHz e:0.00MHz) successfully initialized.
> [    4.028220] mcp251xfd spi1.1 can1: MCP2518FD rev0.0 (-RX_INT -MAB_NO_WARN +CRC_REG +CRC_RX +CRC_TX +ECC -HD c:40.00MHz m:20.00MHz r:17.00MHz e:0.00MHz) successfully initialized.
> 
> This is the current dtsi part for the canFD chips. (With rx-pin removed on one of the devices and trigger on edge on the other for debugging purposes). 
> 
> #include <dt-bindings/gpio/gpio.h>
> #include <dt-bindings/interrupt-controller/irq.h>
> 
> &am33xx_pinmux{
>         pinctrl_spi1_pins: pinctrl_spi1_pins {
>                 pinctrl-single,pins = <
>                         AM33XX_IOPAD(0x990, PIN_INPUT | MUX_MODE3) /* (A13) mcasp0_aclkx.spi1_sclk */
>                         AM33XX_IOPAD(0x994, PIN_INPUT | MUX_MODE3) /* (B13) mcasp0_fsx.spi1_d0 */
>                         AM33XX_IOPAD(0x998, PIN_INPUT | MUX_MODE3) /* (D12) mcasp0_axr0.spi1_d1 */
>                         AM33XX_IOPAD(0x96c, PIN_OUTPUT_PULLUP | MUX_MODE5) /* (E17) uart0_rtsn.spi1_cs0         CleANopen       LEFT*/
>                         AM33XX_IOPAD(0x9b0, PIN_OUTPUT_PULLUP | MUX_MODE4) /* (A15) xdma_event_intr0.spi1_cs1   SAWM CAN        RIGHT*/
>                 >;
>         };
> 
>         can0_int_pins: can0_int_pins {
>                 pinctrl-single,pins = <
>                 /*CleANopen*/
>                 AM33XX_IOPAD(0x89c, PIN_INPUT_PULLUP | MUX_MODE7) /* (T6) gpmc_be0n_cle.gpio2[5]        nINT            */
>                 AM33XX_IOPAD(0x968, PIN_INPUT_PULLUP | MUX_MODE7) /* (E18) uart0_ctsn.gpio1[8]          nINT1           */
>                 >;
>         };
> 
>         can1_int_pins: can1_int_pins {
>                 pinctrl-single,pins = <
>                 /*SAWM CAN*/
>                 AM33XX_IOPAD(0x820, PIN_INPUT_PULLUP | MUX_MODE7) /* (U10) gpmc_ad8.gpio0[22]           nINT            */
>                 AM33XX_IOPAD(0x8c8, PIN_INPUT_PULLUP | MUX_MODE7) /* (U3) lcd_data10.gpio2[16]  nINT1           */
>                 AM33XX_IOPAD(0x8cc, PIN_INPUT_PULLUP | MUX_MODE7) /* (U4) lcd_data11.gpio2[17]  nINT0 NOT USED  */
>                 >;
>         };
> };
> 
> 
> 
> /{
>         /* external 40M oscillator of mcp2518fd on SPI1.0 */
>         mcp2518fd_can0_osc: mcp2518fd_can0_osc {
>                 compatible = "fixed-clock";
>                 #clock-cells = <0>;
>                 clock-frequency = <40000000>;
>         };
> };
> 
> 
> /{
>         /* external 40M oscillator of mcp2518fd on SPI1.1 */
>         mcp2518fd_can1_osc: mcp2518fd_can1_osc {
>                 compatible = "fixed-clock";
>                 #clock-cells = <0>;
>                 clock-frequency = <40000000>;
>         };
> };
> 
> /* the spi config of the can-controller itself binding everything together */
> &spi1{
>     #address-cells = <1>;
>     #size-cells = <0>;
> 
>     status = "okay";
>     pinctrl-names = "default";
>     pinctrl-0 = <&pinctrl_spi1_pins>;
> 
>     /*CleANopen*/
>     can@0 {
>         compatible = "microchip,mcp2518fd";
>         reg = <0>;
>         clocks = <&mcp2518fd_can0_osc>;
>         pinctrl-names = "default";
>         pinctrl-0 = <&can0_int_pins>;
>         spi-max-frequency = <20000000>;
>         interrupts-extended = <&gpio2 5 IRQ_TYPE_EDGE_FALLING>;

See comment about edge IRQs above!

>         microchip,rx-int-gpios = <&gpio1 8 GPIO_ACTIVE_LOW>;
>     };
> 
>     can@1 {
>         compatible = "microchip,mcp2518fd";
>         reg = <1>;
>         clocks = <&mcp2518fd_can1_osc>;
>         pinctrl-names = "default";
>         pinctrl-0 = <&can1_int_pins>;
>         spi-max-frequency = <20000000>;
>         interrupts-extended = <&gpio0 22 IRQ_TYPE_LEVEL_LOW>;
> //        microchip,rx-int-gpios = <&gpio2 16 GPIO_ACTIVE_LOW>;
>     };
> };

Can you test again with IRQ_TYPE_LEVEL_LOW and no rx-int-gpios. Please
instrument the beginning and the returns of the mcp251xfd_irq() function
to check if it's really started twice. Please print the
priv->regs_status.intf in every loop. Can you record the gpio2_5 with a
scope.

Make sure that you don't send any CAN frames as reaction of reception.

regards,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Automotive Discussions]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [CAN Bus]

  Powered by Linux