AW: mcp251xfd: Bad message receiption

Stefan Althöfer <Stefan.Althoefer@xxxxxxxxxxx> · Thu, 22 Dec 2022 10:30:22 +0000

Hi Thomas,

> Do I read the pdf correctly (based on the /var/log stuff) that you have two MCP2518FD connected to a Pi4B and 
> both of them are running in internal/external loopback mode no interaction between them and the SPIs are separate?

Yes. For the loopback test the CANs are separate. Errors also occur when sending messages between
the controllers, but I think that is more difficult to analyze.

root@raspberrypi:~# ip -d -s a s can0
4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UP group default qlen 1000
    link/can  promiscuity 0 minmtu 0 maxmtu 0
    can <LOOPBACK,BERR-REPORTING,FD> state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
          bitrate 1000000 sample-point 0.800
          tq 25 prop-seg 15 phase-seg1 16 phase-seg2 8 sjw 6
          mcp251xfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1
          dbitrate 4000000 dsample-point 0.800
          dtq 25 dprop-seg 3 dphase-seg1 4 dphase-seg2 2 dsjw 2
          mcp251xfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1
          clock 40000000
          re-started bus-errors arbit-lost error-warn error-pass bus-off
          0          0          0          0          0          0         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
    RX: bytes  packets  errors  dropped missed  mcast
    3097429872 2517466658 0       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    3696197680 1258733264 0       0       0       0
root@raspberrypi:~# ip -d -s a s can1
5: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UP group default qlen 1000
    link/can  promiscuity 0 minmtu 0 maxmtu 0
    can <LOOPBACK,BERR-REPORTING,FD> state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
          bitrate 1000000 sample-point 0.800
          tq 25 prop-seg 15 phase-seg1 16 phase-seg2 8 sjw 6
          mcp251xfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1
          dbitrate 4000000 dsample-point 0.800
          dtq 25 dprop-seg 3 dphase-seg1 4 dphase-seg2 2 dsjw 2
          mcp251xfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1
          clock 40000000
          re-started bus-errors arbit-lost error-warn error-pass bus-off
          0          112256     0          1          3          1         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
    RX: bytes  packets  errors  dropped missed  mcast
    2590649888 773910314 3       0       0       0
    TX: bytes  packets  errors  dropped carrier collsns
    3442619696 386944310 112254  12      0       0

This is the SPI setup for the dual CAN PI:

dtparam=spi=on
dtoverlay=spi6-1cs
# mcp251xfd DTS for RPI4 default CAN on SPI6.0
dtoverlay=mcp251xfd-spi6-0,interrupt_pin=25,oscillator=40000000
# mcp251xfd DTS for RPI4 CAN extension on SPI0.0
dtoverlay=mcp251xfd,spi0-0,interrupt=16,oscillator=40000000

Or did you mean something else with "script"?

I'll try the register dump when I suffer the next error ;-)

Mfg
Stefan

-----Ursprüngliche Nachricht-----
Von: Thomas.Kopp@xxxxxxxxxxxxx <Thomas.Kopp@xxxxxxxxxxxxx> 
Gesendet: Donnerstag, 22. Dezember 2022 10:07
An: Stefan Althöfer <Stefan.Althoefer@xxxxxxxxxxx>
Cc: linux-can@xxxxxxxxxxxxxxx
Betreff: RE: mcp251xfd: Bad message receiption

Hi Stefan,

> I have reduced my test case to a simple single thread self-receipt test:
>     * TX two messages
>     * Wait for RX and send out a new message on every receipt
>     * TX for messages in total
> 
> Refer to the attached PDF for some error cases. Last send frames are 
> at the top of the logs. You can see that wrong messages appear in the 
> RX queue, which have been successfully transmitted in previous test 
> loop. The data that is actually sent out is correct however (checked 
> with an external logger for some cases).

Do I read the pdf correctly (based on the /var/log stuff) that you have two MCP2518FD connected to a Pi4B and both of them are running in internal/external loopback mode no interaction between them and the SPIs are separate?
What are your CAN interface settings? Would it be possible to share the script?

> I see infrequent mcp251xfd CRC read errors. I think those are due to 
> the 2518 SPI errata. However they don't occur at the time when the 
> wrong messages are received (refer to the PDF).
Correct, this shouldn't be related to your problem.

> - Any suggestion how I can step further in fixing this issue.
One thing would be to dump the RAM i.e. the content of the fifos itself to see whether the device actually has the incorrect frames. Marc wrote a tool to dump registers and RAM via debugfs:
https://github.com/linux-can/can-utils/blob/master/mcp251xfd/mcp251xfd-dump.c

For this debugfs needs to be enabled and mounted (e.g. $mount -t debugfs none /sys/kernel/debug)

Now the registers can be dumped like this: cat /sys/kernel/debug/regmap/spi0.0-crc/registers

So I'd suggest to abort the script after the first error occurred and then dump registers/ram to find the RX fifo in question and check the content.

Best Regards,
Thomas