On 11.01.2023 23:20:37, Marc Kleine-Budde wrote: > this is a proof of concept implementation to work around the > "double-RX" erratum found by Stefan Althöfer. > > With the help of Thomas we found out that the chip has a time window > after receiving a CAN frame where the RX FIFO STA register content is > not read correctly. > > From the driver's point of view, everything looks consistent at first, > but the head index of the chip is too large. This causes the driver to > rehandle old CAN frames that have already been processed. > > The workaround uses the RX timestamp to distinguish between new and > old data. As soon as old data is found, processing is stopped. > > The series applies against current net/main. The patches lack proper > descriptions, I'll add them in the next round. > > Happy testing, > Marc > > Link: https://lore.kernel.org/all/FR0P281MB1966273C216630B120ABB6E197E89@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx If you add a "#define DEBUG" before any #includes in mcp251xfd-rc.c you'll get a debug message like this if the workaround triggers: | Jan 11 22:53:10 riot kernel: mcp251xfd spi1.0 mcp251xfd0: mcp251xfd_handle_rxif_one: last_valid=0x17395fba3d56bb9c ts=0x17395fba3d08ee9e d=0xffffffffffb23302 data=00 02 80 fa 24 cc 43 41 - Dropping You can also have a look at the interface statistics. The "RX errors: fifo" is incremented if the workaround triggers. | $ ip --details -s -s link show mcp251xfd0 | 5: mcp251xfd0: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10 | link/can promiscuity 0 minmtu 0 maxmtu 0 | can <LOOPBACK,BERR-REPORTING,FD> state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 1000 | bitrate 1000000 sample-point 0.750 | tq 25 prop-seg 14 phase-seg1 15 phase-seg2 10 sjw 1 brp 1 | mcp251xfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp_inc 1 | dbitrate 4000000 dsample-point 0.700 | dtq 25 dprop-seg 3 dphase-seg1 3 dphase-seg2 3 dsjw 1 dbrp 1 | mcp251xfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp_inc 1 | clock 40000000 | re-started bus-errors arbit-lost error-warn error-pass bus-off | 0 0 0 0 0 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 parentbus spi parentdev spi1.0 | RX: bytes packets errors dropped missed mcast | 190713920 9871978 0 0 0 0 | RX errors: length crc frame fifo overrun | 0 0 0 1 0 ^^^^ | TX: bytes packets errors dropped carrier collsns | 95356960 4935989 0 0 0 0 | TX errors: aborted fifo window heartbt transns | 0 0 0 0 1 Note: Certain Out-of-Memory situations also increase the fifo error counter, but I think you'll get a nice OOM error message from the kernel, too. regards, Marc -- Pengutronix e.K. | Marc Kleine-Budde | Embedded Linux | https://www.pengutronix.de | Vertretung West/Dortmund | Phone: +49-231-2826-924 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Attachment:
signature.asc
Description: PGP signature