Re: MCP2518FD Drivers Rarely Working with Custom Kernel 5.10.Y

Marc Kleine-Budde <mkl@xxxxxxxxxxxxxx> · Fri, 25 Jun 2021 14:16:48 +0200

On 25.06.2021 08:56:26, Marc Kleine-Budde wrote:
> On 23.06.2021 13:34:10, Joshua Quesenberry wrote:
> > I added printing the error number as Marc suggested and the number
> > appears to be -110 every time.
> 
> #define	ETIMEDOUT	110	/* Connection timed out */
> https://elixir.bootlin.com/linux/latest/source/include/uapi/asm-generic/errno.h#L93
> 
> That means something has timed out, we see this in the previous log
> message, too:
> 
> > [   25.660006] CAN device driver interface
> > [   25.668720] spi_master spi0: will run message pump with realtime priority
> > [   25.676697] mcp251xfd spi0.1 can0: MCP2518FD rev0.0 (-RX_INT -MAB_NO_WARN +CRC_REG +CRC_RX +CRC_TX +ECC -HD c:40.00MHz m:20.00MHz r:17.00MHz e:16.66MHz) successfully initialized.
> > [   25.684900] mcp251xfd spi0.0 can1: MCP2518FD rev0.0 (-RX_INT -MAB_NO_WARN +CRC_REG +CRC_RX +CRC_TX +ECC -HD c:40.00MHz m:20.00MHz r:17.00MHz e:16.66MHz) successfully initialized.
> > [   28.098033] mcp251xfd spi0.1 rename4: renamed from can0
> > [   28.175644] mcp251xfd spi0.0 can0: renamed from can1
> > [   28.225891] mcp251xfd spi0.1 can1: renamed from rename4
>                                    VVVVVVVVVVVVVVVVVVVVVV
> > [  146.964971] mcp251xfd spi0.0: SPI transfer timed out
> > [  146.965023] spi_master spi0: failed to transfer one message from queue (ret=-110)
> 
> > [  146.965216] mcp251xfd spi0.0 can0: CRC read error at address 0x0e0c (length=4, data=00 00 00 00, CRC=0x0000) retrying.
> > [  146.965247] mcp251xfd spi0.0 can0: CRC read error at address 0x0e0c (length=4, data=00 00 00 00, CRC=0x0000) retrying.
> > [  146.965277] mcp251xfd spi0.0 can0: CRC read error at address 0x0e0c (length=4, data=00 00 00 00, CRC=0x0000) retrying.
> > [  146.965286] mcp251xfd spi0.0 can0: CRC read error at address 0x0e0c (length=4, data=00 00 00 00, CRC=0x0000).
> > [  146.965331] mcp251xfd spi0.0 can0: CRC read error at address 0x0000 (length=4, data=00 00 00 00, CRC=0x0000) retrying.
> > [  146.965360] mcp251xfd spi0.0 can0: CRC read error at address 0x0000 (length=4, data=00 00 00 00, CRC=0x0000) retrying.
> > [  146.965389] mcp251xfd spi0.0 can0: CRC read error at address 0x0000 (length=4, data=00 00 00 00, CRC=0x0000) retrying.
> > [  146.965397] mcp251xfd spi0.0 can0: CRC read error at address 0x0000 (length=4, data=00 00 00 00, CRC=0x0000).
> > [  146.965413] A link change request failed with some changes committed already. Interface can0 may have been left with an inconsistent configuration, please check.
> > 
> > Regarding the discussion about Kconfig flags, I went ahead and rebuilt
> > kernel 5.10.44 using a config that was essentially
> > arch/arm/configs/bcm2711_defconfig with these additions needed to get
> > our I2S working. This should have undone the switch to ONDEMAND
> > governor and enabling 1000 Hz clock.
> 
> Please switch back the clock to the standard HZ setting.

I compiled my 64 bit raspi kernel with HZ=1000 and my mcp2518fd board
works without problem on my raspi4b.

| static int bcm2835_spi_transfer_one_poll(struct spi_controller *ctlr,
| 					 struct spi_device *spi,
| 					 struct spi_transfer *tfr,
| 					 u32 cs)
| {
[...]
| 	/* set the timeout to at least 2 jiffies */
| 	timeout = jiffies + 2 + HZ * polling_limit_us / 1000000;

The timeout is calculated in jiffies. The jiffies variable is
incremented once per timer tick (which depends on the clock HZ
configuration). There are "HZ" jiffies per second. This means the above
"2" equals 8ms (HZ=250), but with HZ=1000 only 2ms.

To keep the timeout constant, you can change this into:

        timeout = jiffies + (HZ * 8) / 1000 + HZ * polling_limit_us / 1000000;

However, the polling mode will only be used for transfers that should
finish in 30 µs. So even 2ms is far of...

| 
| 	/* loop until finished the transfer */
| 	while (bs->rx_len) {
| 		/* fill in tx fifo with remaining data */
| 		bcm2835_wr_fifo(bs);
| 
| 		/* read from fifo as much as possible */
| 		bcm2835_rd_fifo(bs);
| 
| 		/* if there is still data pending to read
| 		 * then check the timeout
| 		 */
| 		if (bs->rx_len && time_after(jiffies, timeout)) {

If there is a timeout, the driver will fall back to IRQ mode.

Can you add a "#define DEBUG" in spi-bcm2835.c, even before the other
"#include" directives. That should give you this debug message:

| 			dev_dbg_ratelimited(&spi->dev,
| 					    "timeout period reached: jiffies: %lu remaining tx/rx: %d/%d - falling back to interrupt mode\n",
| 					    jiffies - timeout,
| 					    bs->tx_len, bs->rx_len);
| 			/* fall back to interrupt mode */
| 
| 			/* update usage statistics */
| 			bs->count_transfer_irq_after_polling++;
| 
| 			return bcm2835_spi_transfer_one_irq(ctlr, spi,
| 							    tfr, cs, false);

here it activates the IRQ. But I'm not sure if the fallback works
correctly....

| 		}
| 	}
| 

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |
Attachment:
signature.asc

Description: PGP signature