Hello Thomas, On 07.09.2022 19:19:42, Thomas.Kopp@xxxxxxxxxxxxx wrote: > > In other words this means the SJW parameter controls the tolerance of > > the CAN controller against the frequency error compared to other CAN > > controllers. > > > > If the user space doesn't provide a SJW parameter, the kernel chooses > > a default value of 1. This proved to be a good default value for CAN > > controllers, but not anymore with modern controllers. > > > > To get the same oscillator tolerances on controllers with wide bit > > timing registers, choose a default SJW value of Phase Seg2 / 2. This > > results in the following bit timing parameters: > > Thanks for your work on this! May I ask why Phase Seg2 / 2 is chosen? According to "Combining CANopen and SAE J1939 networks" [2], CANopen and J1939 use a SJW of 1: "The CANopen standard CiA 301 allows various bitrates in the range from 10 kBit/s to 1 MBit/s. The J1939-11 specification stipulates 250 kBit/s and is used in the majority of applications. The J1939-14 standard specifies 500 kBit/s for the physical layer. Thereby the bitrate for a shared physical layer is limited to the bitrates 250 kBit/s and 500 kBit/s. Fortunately both standards define the same sample point location at 87,5 % together with a SJW value of 1." I think this is why we implemented a default SJW of 1 in the kernel. As discussed in my patch description a SJW of 1 for controllers that were common at the time these standard were written, with narrow bit timing registers, equals "Phase Seg2 / 2". In "The Configuration of the CAN Bit Timing" [3] the following criteria for the Oscillator Tolerance Range was given: "The tolerance range df for an oscillator’s frequency f_osc around the nominal frequency f_nom with: (1 - df) * f_nom <= f_osc <= (1 + df) * f_nom depends on the proportions of Phase_Seg1, Phase_Seg2, SJW, and the bit time. The maximum tolerance df is the defined by two conditions (both shall be met): min(Phase_Seg1, Phase_Seg2) I: df <= -------------------------------- 2 * (13 * bit_time - Phase_Seg2) SJW II: df <= ------------- 20 * bit_time " The 2nd criteria (see "df2" column) is influenced by SJW, so of interest. Here a sja1000 controller: | Bit timing parameters for sja1000 with 8.000000 MHz ref clock using algo 'v5.19' | _----+--------------=> TSeg1: 1 … 16 | / / _---------=> TSeg2: 1 … 8 | | | / _-----=> SJW: 1 … 4 | | | | / _-=> BRP: 1 … 64 (inc: 1) | | | | | / | nominal | | | | | real Bitrt nom real SampP | Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP Bitrate Error SampP SampP Error df1 df2 BTR0 BTR1 | 1000000 125 2 3 2 1 1 1000000 0.0% 75.0% 75.0% 0.0% 9.8‰ 6.2‰ 0x00 0x14 | 800000 125 3 4 2 1 1 800000 0.0% 80.0% 80.0% 0.0% 7.8‰ 5.0‰ 0x00 0x16 | 666666 125 4 4 3 1 1 666666 0.0% 80.0% 75.0% 6.2% 9.8‰ 4.2‰ 0x00 0x27 | 500000 125 6 7 2 1 1 500000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x00 0x1c | 250000 250 6 7 2 1 2 250000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x01 0x1c | 125000 500 6 7 2 1 4 125000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x03 0x1c | 100000 625 6 7 2 1 5 100000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x04 0x1c | 83333 750 6 7 2 1 6 83333 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x05 0x1c | 50000 1250 6 7 2 1 10 50000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x09 0x1c | 33333 1875 6 7 2 1 15 33333 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x0e 0x1c | 20000 3125 6 7 2 1 25 20000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x18 0x1c | 10000 6250 6 7 2 1 50 10000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x31 0x1c "df2" is 3.1‰ for 500kbit/s or slower. In contrast the mcp251xfd with the current mainline bit timing algorithm (SJW=1): | Bit timing parameters for mcp251xfd with 40.000000 MHz ref clock (CIA recommendation) using algo 'v5.19' | _----+--------------=> TSeg1: 2 … 256 | / / _---------=> TSeg2: 1 … 128 | | | / _-----=> SJW: 1 … 128 | | | | / _-=> BRP: 1 … 256 (inc: 1) | | | | | / | nominal | | | | | real Bitrt nom real SampP | Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP Bitrate Error SampP SampP Error df1 df2 NBTCFG | 1000000 25 14 15 10 1 1 1000000 0.0% 75.0% 75.0% 0.0% 9.8‰ 1.2‰ 0x001c0900 | 800000 25 19 20 10 1 1 800000 0.0% 80.0% 80.0% 0.0% 7.8‰ 1.0‰ 0x00260900 | 666666 25 23 24 12 1 1 666666 0.0% 80.0% 80.0% 0.0% 7.8‰ 0.8‰ 0x002e0b00 | 500000 25 34 35 10 1 1 500000 0.0% 87.5% 87.5% 0.0% 4.9‰ 0.6‰ 0x00440900 | 250000 25 69 70 20 1 1 250000 0.0% 87.5% 87.5% 0.0% 4.9‰ 0.3‰ 0x008a1300 | 125000 50 69 70 20 1 2 125000 0.0% 87.5% 87.5% 0.0% 4.9‰ 0.3‰ 0x018a1300 | 100000 50 87 87 25 1 2 100000 0.0% 87.5% 87.5% 0.0% 4.9‰ 0.2‰ 0x01ad1800 | 83333 50 104 105 30 1 2 83333 0.0% 87.5% 87.5% 0.0% 4.9‰ 0.2‰ 0x01d01d00 | 50000 100 87 87 25 1 4 50000 0.0% 87.5% 87.5% 0.0% 4.9‰ 0.2‰ 0x03ad1800 | 33333 125 104 105 30 1 5 33333 0.0% 87.5% 87.5% 0.0% 4.9‰ 0.2‰ 0x04d01d00 | 20000 250 87 87 25 1 10 20000 0.0% 87.5% 87.5% 0.0% 4.9‰ 0.2‰ 0x09ad1800 | 10000 500 87 87 25 1 20 10000 0.0% 87.5% 87.5% 0.0% 4.9‰ 0.2‰ 0x13ad1800 for slow bit rates "df2" drops to 0.2‰. Here the mcp251xfd with the updated algorithm (SJW = Phase Seg2 / 2): | Bit timing parameters for mcp251xfd with 40.000000 MHz ref clock (CIA recommendation) using algo 'next' | _----+--------------=> TSeg1: 2 … 256 | / / _---------=> TSeg2: 1 … 128 | | | / _-----=> SJW: 1 … 128 | | | | / _-=> BRP: 1 … 256 (inc: 1) | | | | | / | nominal | | | | | real Bitrt nom real SampP | Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP Bitrate Error SampP SampP Error df1 df2 NBTCFG | 1000000 25 14 15 10 5 1 1000000 0.0% 75.0% 75.0% 0.0% 9.8‰ 6.2‰ 0x001c0904 | 800000 25 19 20 10 5 1 800000 0.0% 80.0% 80.0% 0.0% 7.8‰ 5.0‰ 0x00260904 | 666666 25 23 24 12 6 1 666666 0.0% 80.0% 80.0% 0.0% 7.8‰ 5.0‰ 0x002e0b05 | 500000 25 34 35 10 5 1 500000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x00440904 | 250000 25 69 70 20 10 1 250000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x008a1309 | 125000 50 69 70 20 10 2 125000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x018a1309 | 100000 50 87 87 25 12 2 100000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.0‰ 0x01ad180b | 83333 50 104 105 30 15 2 83333 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x01d01d0e | 50000 100 87 87 25 12 4 50000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.0‰ 0x03ad180b | 33333 125 104 105 30 15 5 33333 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.1‰ 0x04d01d0e | 20000 250 87 87 25 12 10 20000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.0‰ 0x09ad180b | 10000 500 87 87 25 12 20 10000 0.0% 87.5% 87.5% 0.0% 4.9‰ 3.0‰ 0x13ad180b The "df2" stays at ~3.1‰ [2] http://web.archive.org/https://www.can-cia.org/fileadmin/resources/documents/proceedings/2013_mmc_koppe2.pdf [3] http://web.archive.org/http://www.oertel-halle.de/files/cia99paper.pdf And I tested with SJW=max from mcp251x (not the mcp251xfd) with SJA=max to a peak USB adapter with SJW=1 - the peak adapter fails to receive CAN frames: | 5: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10 | link/can promiscuity 0 minmtu 0 maxmtu 0 | can state ERROR-ACTIVE restart-ms 1000 | bitrate 500000 sample-point 0.850 | tq 100 prop-seg 8 phase-seg1 8 phase-seg2 3 sjw 3 brp 1 | mcp251x: tseg1 3..16 tseg2 2..8 sjw 1..4 brp 1..64 brp_inc 1 | clock 10000000 | re-started bus-errors arbit-lost error-warn error-pass bus-off | 0 0 0 2 2 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 parentbus spi parentdev spi1.1 | 97: peakfd0: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10 | link/can promiscuity 0 minmtu 0 maxmtu 0 | can <FD> state ERROR-PASSIVE (berr-counter tx 0 rx 136) restart-ms 1000 | bitrate 500000 sample-point 0.875 | tq 12 prop-seg 69 phase-seg1 70 phase-seg2 20 sjw 1 brp 1 | pcan_usb_fd: tseg1 1..256 tseg2 1..128 sjw 1..128 brp 1..1024 brp_inc 1 | dbitrate 2000000 dsample-point 0.750 | dtq 12 dprop-seg 14 dphase-seg1 15 dphase-seg2 10 dsjw 1 dbrp 1 | pcan_usb_fd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..1024 dbrp_inc 1 | clock 80000000 | re-started bus-errors arbit-lost error-warn error-pass bus-off | 0 0 0 1 2 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 gro_max_size 65536 parentbus usb parentdev 2-2.2:1.0 | (1970-01-01 02:41:33.856148) peakfd0 RX - - 20000004 [8] 00 04 00 00 00 00 00 00 ERRORFRAME | controller-problem{rx-error-warning} | (1970-01-01 02:41:33.857168) peakfd0 RX - - 20000004 [8] 00 10 00 00 00 00 00 00 ERRORFRAME | controller-problem{rx-error-passive} With SJW = Phase Seg2 / 2, it works. > In general the recommendation is to chose SJW as big as possible. > Here's an excerpt from the CiA 601-3 (Unfortunately I can't share the > entire document) CiA 301 and J1939 seem to have a different opinion... > "Recom2: Choose sjwA as large as possible. > - The maximum possible value is sjwA = min(ps1A, ps2A). > - During arbitration phase, a large sjwA allows a CAN node to resynchronize quickly on the leading transmitting node. > - If the CAN controller does not allow configuring the maximum possible sjwA > because the value range for sjwA is limited in the CAN controller, then consider > increasing BRPA. This allows configuring a larger sjwA, unless this can impact > phase error tolerance in an unwanted way, see Recom4." > > I think we discussed this a while ago already and the biggest > reasoning against maxing SJW was breaking "compatibility" to old > kernel versions. ACK > The compatibility in this is not being able to sync > though... Now, if a change to a larger default values is agreed upon > we might as well chose min(phaseseg1, phaseseg2) directly to maximize > robustness for the user that doesn't set it explicitly. Given that > most of the getting started with Socketcan results on your favorite > Search engine don't even mention SJW that's probably a significant > portion of the users. From a pure communication standpoint I can't see > any downside to doing this, the only hypothetical case would be if a > CAN controller had an erratum about higher SJWs and the driver not > handling it. > > So, in essence, I'd propose to use min(p1,p2). NACK - as I have a setup that doesn't work anymore when mixing an old and a new kernel. regards, Marc -- Pengutronix e.K. | Marc Kleine-Budde | Embedded Linux | https://www.pengutronix.de | Vertretung West/Dortmund | Phone: +49-231-2826-924 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |
Attachment:
signature.asc
Description: PGP signature