Hi Thomas, > Do I read the pdf correctly (based on the /var/log stuff) that you have two MCP2518FD connected to a Pi4B and > both of them are running in internal/external loopback mode no interaction between them and the SPIs are separate? Yes. For the loopback test the CANs are separate. Errors also occur when sending messages between the controllers, but I think that is more difficult to analyze. root@raspberrypi:~# ip -d -s a s can0 4: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UP group default qlen 1000 link/can promiscuity 0 minmtu 0 maxmtu 0 can <LOOPBACK,BERR-REPORTING,FD> state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0 bitrate 1000000 sample-point 0.800 tq 25 prop-seg 15 phase-seg1 16 phase-seg2 8 sjw 6 mcp251xfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1 dbitrate 4000000 dsample-point 0.800 dtq 25 dprop-seg 3 dphase-seg1 4 dphase-seg2 2 dsjw 2 mcp251xfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1 clock 40000000 re-started bus-errors arbit-lost error-warn error-pass bus-off 0 0 0 0 0 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 RX: bytes packets errors dropped missed mcast 3097429872 2517466658 0 0 0 0 TX: bytes packets errors dropped carrier collsns 3696197680 1258733264 0 0 0 0 root@raspberrypi:~# ip -d -s a s can1 5: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UP group default qlen 1000 link/can promiscuity 0 minmtu 0 maxmtu 0 can <LOOPBACK,BERR-REPORTING,FD> state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0 bitrate 1000000 sample-point 0.800 tq 25 prop-seg 15 phase-seg1 16 phase-seg2 8 sjw 6 mcp251xfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1 dbitrate 4000000 dsample-point 0.800 dtq 25 dprop-seg 3 dphase-seg1 4 dphase-seg2 2 dsjw 2 mcp251xfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1 clock 40000000 re-started bus-errors arbit-lost error-warn error-pass bus-off 0 112256 0 1 3 1 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 RX: bytes packets errors dropped missed mcast 2590649888 773910314 3 0 0 0 TX: bytes packets errors dropped carrier collsns 3442619696 386944310 112254 12 0 0 This is the SPI setup for the dual CAN PI: dtparam=spi=on dtoverlay=spi6-1cs # mcp251xfd DTS for RPI4 default CAN on SPI6.0 dtoverlay=mcp251xfd-spi6-0,interrupt_pin=25,oscillator=40000000 # mcp251xfd DTS for RPI4 CAN extension on SPI0.0 dtoverlay=mcp251xfd,spi0-0,interrupt=16,oscillator=40000000 Or did you mean something else with "script"? I'll try the register dump when I suffer the next error ;-) Mfg Stefan -----Ursprüngliche Nachricht----- Von: Thomas.Kopp@xxxxxxxxxxxxx <Thomas.Kopp@xxxxxxxxxxxxx> Gesendet: Donnerstag, 22. Dezember 2022 10:07 An: Stefan Althöfer <Stefan.Althoefer@xxxxxxxxxxx> Cc: linux-can@xxxxxxxxxxxxxxx Betreff: RE: mcp251xfd: Bad message receiption Hi Stefan, > I have reduced my test case to a simple single thread self-receipt test: > * TX two messages > * Wait for RX and send out a new message on every receipt > * TX for messages in total > > Refer to the attached PDF for some error cases. Last send frames are > at the top of the logs. You can see that wrong messages appear in the > RX queue, which have been successfully transmitted in previous test > loop. The data that is actually sent out is correct however (checked > with an external logger for some cases). Do I read the pdf correctly (based on the /var/log stuff) that you have two MCP2518FD connected to a Pi4B and both of them are running in internal/external loopback mode no interaction between them and the SPIs are separate? What are your CAN interface settings? Would it be possible to share the script? > I see infrequent mcp251xfd CRC read errors. I think those are due to > the 2518 SPI errata. However they don't occur at the time when the > wrong messages are received (refer to the PDF). Correct, this shouldn't be related to your problem. > - Any suggestion how I can step further in fixing this issue. One thing would be to dump the RAM i.e. the content of the fifos itself to see whether the device actually has the incorrect frames. Marc wrote a tool to dump registers and RAM via debugfs: https://github.com/linux-can/can-utils/blob/master/mcp251xfd/mcp251xfd-dump.c For this debugfs needs to be enabled and mounted (e.g. $mount -t debugfs none /sys/kernel/debug) Now the registers can be dumped like this: cat /sys/kernel/debug/regmap/spi0.0-crc/registers So I'd suggest to abort the script after the first error occurred and then dump registers/ram to find the RX fifo in question and check the content. Best Regards, Thomas