On 10.08.2022 11:30:53, Mark Bath wrote: > I hope someone can give me some pointers on what might be causing my > system an issue or how to debug the issue. > > The revision 2 Waveshare Dual channel MCP251xFD CAN HAT was working > fine in my LAB with 2 or 3 other can devices. Both can channels are > using standard 29 bit CAN2.0 extended identifiers. The can0 interface > was running at 250kb, and can1at 500Kb. > > As soon as I moved the device into my live environment I have started > to have issues. > > The 250Kb segment has around 10-15 devices and a bus length of > something in the order of 40m, properly terminated at each end with a > 120ohm resistor. The 500kb segment has 2 devices and is maybe 10m in > length and also properly terminated. Without the PI connected both > segments are running fine with no reported BER counters. The following > output is from an embedded linux based device on the network when my > PI is not connected. > > root@Venus:~# ip -details link show can0 > 3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 100 > link/can promiscuity 0 minmtu 0 maxmtu 0 > can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 100 > bitrate 250000 sample-point 0.875 > tq 250 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1 > sun4i_can: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1 > clock 24000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 > > The bus loading is very low. > > root@Venus:~# canbusload can0@250000 > can0@250000 84 13440 5376 5% > can0@250000 133 21280 8512 8% > can0@250000 95 15200 6080 6% > can0@250000 114 18240 7296 7% > can0@250000 105 16800 6720 6% > can0@250000 132 21020 8368 8% > can0@250000 104 16640 6656 6% > > As soon as I attach the PI4 with the revision 2.1 waveshare 251xFD > dual can hat I start getting BER errors on devices, and the PI is > reporting RX BER errors. > > The embedded Linux device > root@Venus:~# ip -details link show can0 > 3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 100 > link/can promiscuity 0 minmtu 0 maxmtu 0 > can state ERROR-ACTIVE (berr-counter tx 0 rx 83) restart-ms 100 > bitrate 250000 sample-point 0.875 > tq 250 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1 ^^^^^^^^^^^^^^^^^^ Here the sjw is 50% of phase-seg2. > sun4i_can: tseg1 1..16 tseg2 1..8 sjw 1..4 brp 1..64 brp-inc 1 > clock 24000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 > > > RaspberryPI4 with the wave share dual can hat > root@Olaso-PI:~# ip -details link show can0 > 5: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 100 > link/can promiscuity 0 minmtu 0 maxmtu 0 > can state ERROR-WARNING (berr-counter tx 0 rx 124) restart-ms 100 > bitrate 250000 sample-point 0.875 > tq 25 prop-seg 69 phase-seg1 70 phase-seg2 20 sjw 1 ^^^^^^^^^^^^^^^^^^^ Can you try to configure sjw to 10 on the mcp251xfd for 250 kbit/s. Which tool are you using to configure the bitrate? > mcp251xfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1 > mcp251xfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1 > clock 40000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 > > ip -details link show can1 > 6: can1: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 100 > link/can promiscuity 0 minmtu 0 maxmtu 0 > can state ERROR-WARNING (berr-counter tx 0 rx 125) restart-ms 100 > bitrate 500000 sample-point 0.875 > tq 25 prop-seg 34 phase-seg1 35 phase-seg2 10 sjw 1 ^^^^^^^^^^^^^^^^^^^ Try a sjw of 5 for 500 kbit/s. > mcp251xfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1 > mcp251xfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1 > clock 40000000 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 > > Do you have any idea what might be doing this. I have checked cables, > termination, voltages, etc and all are OK. I have asked Waveshare and > basically been given the not our issue, its the network. > > I have even checked the cable drops between the backbone and PI, by > plugging alternative devices in and they have behaved correctly. > Moving the PI to an alternative drop has also not hanged anything. > > It seems to me that there might be a timing issue, but have no idea > how to check. regards, Marc -- Pengutronix e.K. | Marc Kleine-Budde | Embedded Linux | https://www.pengutronix.de | Vertretung West/Dortmund | Phone: +49-231-2826-924 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Attachment:
signature.asc
Description: PGP signature