Re: Raspberry PI running 5.10.x Kernel and issues with Daul channel Waveshare based 2.1 MCP251xFD CAN HAT

Marc Kleine-Budde <mkl@xxxxxxxxxxxxxx> · Wed, 10 Aug 2022 12:36:05 +0200

On 10.08.2022 11:30:53, Mark Bath wrote:
> I hope someone can give me some pointers on what might be causing my
> system an issue or how to debug the issue.
> 
> The revision 2 Waveshare Dual channel MCP251xFD CAN HAT was working
> fine in my LAB with 2 or 3 other can devices. Both can channels are
> using standard 29 bit CAN2.0 extended identifiers. The can0 interface
> was running at 250kb, and can1at 500Kb.
> 
> As soon as I moved the device into my live environment I have started
> to have issues.
> 
> The 250Kb segment has around 10-15 devices and a bus length of
> something in the order of 40m, properly terminated at each end with a
> 120ohm resistor. The 500kb segment has 2 devices and is maybe 10m in
> length and also properly terminated. Without the PI connected both
> segments are running fine with no reported BER counters. The following
> output is from an embedded linux based device on the network when my
> PI is not connected.
> 
> root@Venus:~# ip -details link show can0
> 3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 100
>     link/can  promiscuity 0 minmtu 0 maxmtu 0 
>     can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 100 
> 	  bitrate 250000 sample-point 0.875 
> 	  tq 250 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
> 	  sun4i_can: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
> 	  clock 24000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
> 
> The bus loading is very low.
> 
> root@Venus:~# canbusload can0@250000 
>  can0@250000    84   13440   5376   5%
>  can0@250000   133   21280   8512   8%
>  can0@250000    95   15200   6080   6%
>  can0@250000   114   18240   7296   7%
>  can0@250000   105   16800   6720   6%
>  can0@250000   132   21020   8368   8%
>  can0@250000   104   16640   6656   6%
> 
> As soon as I attach the PI4 with the revision 2.1 waveshare 251xFD
> dual can hat I start getting BER errors on devices, and the PI is
> reporting RX BER errors.
> 
> The embedded Linux device
> root@Venus:~# ip -details link show can0
> 3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 100
>     link/can  promiscuity 0 minmtu 0 maxmtu 0 
>     can state ERROR-ACTIVE (berr-counter tx 0 rx 83) restart-ms 100 
> 	  bitrate 250000 sample-point 0.875 
> 	  tq 250 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
                                         ^^^^^^^^^^^^^^^^^^

Here the sjw is 50% of phase-seg2.

> 	  sun4i_can: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1
> 	  clock 24000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
> 
> 
> RaspberryPI4 with the wave share dual can hat
> root@Olaso-PI:~# ip -details link show can0
> 5: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 100
>     link/can  promiscuity 0 minmtu 0 maxmtu 0 
>     can state ERROR-WARNING (berr-counter tx 0 rx 124) restart-ms 100 
> 	  bitrate 250000 sample-point 0.875 
> 	  tq 25 prop-seg 69 phase-seg1 70 phase-seg2 20 sjw 1
                                          ^^^^^^^^^^^^^^^^^^^
Can you try to configure sjw to 10 on the mcp251xfd for 250 kbit/s.

Which tool are you using to configure the bitrate?

> 	  mcp251xfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1
> 	  mcp251xfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1
> 	  clock 40000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
> 
> ip -details link show can1
> 6: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 100
>     link/can  promiscuity 0 minmtu 0 maxmtu 0 
>     can state ERROR-WARNING (berr-counter tx 0 rx 125) restart-ms 100 
> 	  bitrate 500000 sample-point 0.875 
> 	  tq 25 prop-seg 34 phase-seg1 35 phase-seg2 10 sjw 1
                                          ^^^^^^^^^^^^^^^^^^^

Try a sjw of 5 for 500 kbit/s.

> 	  mcp251xfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1
> 	  mcp251xfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1
> 	  clock 40000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 
> 
> Do you have any idea what might be doing this. I have checked cables,
> termination, voltages, etc and all are OK. I have asked Waveshare and
> basically been given the not our issue, its the network.
> 
> I have even checked the cable drops between the backbone and PI, by
> plugging alternative devices in and they have behaved correctly.
> Moving the PI to an alternative drop has also not hanged anything.
> 
> It seems to me that there might be a timing issue, but have no idea
> how to check.

regards,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |
Attachment:
signature.asc

Description: PGP signature