Re: [PATCH 5/5] can: bittiming: can_calc_bittiming(): use Phase Seg2 / 2 as default for SJW

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Thomas,

On 07.09.2022 19:19:42, Thomas.Kopp@xxxxxxxxxxxxx wrote:
> > In other words this means the SJW parameter controls the tolerance of
> > the CAN controller against the frequency error compared to other CAN
> > controllers.
> >
> > If the user space doesn't provide a SJW parameter, the kernel chooses
> > a default value of 1. This proved to be a good default value for CAN
> > controllers, but not anymore with modern controllers.
> >
> > To get the same oscillator tolerances on controllers with wide bit
> > timing registers, choose a default SJW value of Phase Seg2 / 2. This
> > results in the following bit timing parameters:
>
> Thanks for your work on this! May I ask why Phase Seg2 / 2 is chosen?

According to "Combining CANopen and SAE J1939 networks" [2], CANopen and
J1939 use a SJW of 1:

"The CANopen standard CiA 301 allows various bitrates in the range from
 10 kBit/s to 1 MBit/s. The J1939-11 specification stipulates 250 kBit/s
 and is used in the majority of applications. The J1939-14 standard
 specifies 500 kBit/s for the physical layer.

 Thereby the bitrate for a shared physical layer is limited to the
 bitrates 250 kBit/s and 500 kBit/s. Fortunately both standards define
 the same sample point location at 87,5 % together with a SJW value of
 1."

I think this is why we implemented a default SJW of 1 in the kernel.

As discussed in my patch description a SJW of 1 for controllers that
were common at the time these standard were written, with narrow bit
timing registers, equals "Phase Seg2 / 2".

In "The Configuration of the CAN Bit Timing" [3] the following criteria
for the Oscillator Tolerance Range was given:

"The tolerance range df for an oscillator’s frequency f_osc around the
 nominal frequency f_nom with:

 (1 - df) * f_nom <= f_osc <= (1 + df) * f_nom

 depends on the proportions of Phase_Seg1, Phase_Seg2, SJW, and the bit
 time. The maximum tolerance df is the defined by two conditions (both
 shall be met):

              min(Phase_Seg1, Phase_Seg2)
 I:  df <= --------------------------------
           2 * (13 * bit_time - Phase_Seg2)

                SJW
 II: df <= -------------
           20 * bit_time
"

The 2nd criteria (see "df2" column) is influenced by SJW, so of
interest. Here a sja1000 controller:

| Bit timing parameters for sja1000 with 8.000000 MHz ref clock using algo 'v5.19'
|                     _----+--------------=> TSeg1: 1 …   16
|                    /    /     _---------=> TSeg2: 1 …    8
|                   |    |     /    _-----=> SJW:   1 …    4
|                   |    |    |    /    _-=> BRP:   1 …   64 (inc: 1)
|                   |    |    |   |    /
|  nominal          |    |    |   |   |     real  Bitrt    nom   real   SampP
|  Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP  Bitrate  Error  SampP  SampP   Error   df1    df2   BTR0 BTR1
|  1000000    125   2    3    2   1   1  1000000   0.0%  75.0%  75.0%   0.0%   9.8‰   6.2‰   0x00 0x14
|   800000    125   3    4    2   1   1   800000   0.0%  80.0%  80.0%   0.0%   7.8‰   5.0‰   0x00 0x16
|   666666    125   4    4    3   1   1   666666   0.0%  80.0%  75.0%   6.2%   9.8‰   4.2‰   0x00 0x27
|   500000    125   6    7    2   1   1   500000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x00 0x1c
|   250000    250   6    7    2   1   2   250000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x01 0x1c
|   125000    500   6    7    2   1   4   125000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x03 0x1c
|   100000    625   6    7    2   1   5   100000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x04 0x1c
|    83333    750   6    7    2   1   6    83333   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x05 0x1c
|    50000   1250   6    7    2   1  10    50000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x09 0x1c
|    33333   1875   6    7    2   1  15    33333   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x0e 0x1c
|    20000   3125   6    7    2   1  25    20000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x18 0x1c
|    10000   6250   6    7    2   1  50    10000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x31 0x1c

"df2" is 3.1‰ for 500kbit/s or slower.

In contrast the mcp251xfd with the current mainline bit timing algorithm
(SJW=1):

| Bit timing parameters for mcp251xfd with 40.000000 MHz ref clock (CIA recommendation) using algo 'v5.19'
|                     _----+--------------=> TSeg1: 2 …  256
|                    /    /     _---------=> TSeg2: 1 …  128
|                   |    |     /    _-----=> SJW:   1 …  128
|                   |    |    |    /    _-=> BRP:   1 …  256 (inc: 1)
|                   |    |    |   |    /
|  nominal          |    |    |   |   |     real  Bitrt    nom   real   SampP
|  Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP  Bitrate  Error  SampP  SampP   Error   df1    df2       NBTCFG
|  1000000     25  14   15   10   1   1  1000000   0.0%  75.0%  75.0%   0.0%   9.8‰   1.2‰   0x001c0900
|   800000     25  19   20   10   1   1   800000   0.0%  80.0%  80.0%   0.0%   7.8‰   1.0‰   0x00260900
|   666666     25  23   24   12   1   1   666666   0.0%  80.0%  80.0%   0.0%   7.8‰   0.8‰   0x002e0b00
|   500000     25  34   35   10   1   1   500000   0.0%  87.5%  87.5%   0.0%   4.9‰   0.6‰   0x00440900
|   250000     25  69   70   20   1   1   250000   0.0%  87.5%  87.5%   0.0%   4.9‰   0.3‰   0x008a1300
|   125000     50  69   70   20   1   2   125000   0.0%  87.5%  87.5%   0.0%   4.9‰   0.3‰   0x018a1300
|   100000     50  87   87   25   1   2   100000   0.0%  87.5%  87.5%   0.0%   4.9‰   0.2‰   0x01ad1800
|    83333     50 104  105   30   1   2    83333   0.0%  87.5%  87.5%   0.0%   4.9‰   0.2‰   0x01d01d00
|    50000    100  87   87   25   1   4    50000   0.0%  87.5%  87.5%   0.0%   4.9‰   0.2‰   0x03ad1800
|    33333    125 104  105   30   1   5    33333   0.0%  87.5%  87.5%   0.0%   4.9‰   0.2‰   0x04d01d00
|    20000    250  87   87   25   1  10    20000   0.0%  87.5%  87.5%   0.0%   4.9‰   0.2‰   0x09ad1800
|    10000    500  87   87   25   1  20    10000   0.0%  87.5%  87.5%   0.0%   4.9‰   0.2‰   0x13ad1800

for slow bit rates "df2" drops to 0.2‰.

Here the mcp251xfd with the updated algorithm (SJW = Phase Seg2 / 2):

| Bit timing parameters for mcp251xfd with 40.000000 MHz ref clock (CIA recommendation) using algo 'next'
|                     _----+--------------=> TSeg1: 2 …  256
|                    /    /     _---------=> TSeg2: 1 …  128
|                   |    |     /    _-----=> SJW:   1 …  128
|                   |    |    |    /    _-=> BRP:   1 …  256 (inc: 1)
|                   |    |    |   |    /
|  nominal          |    |    |   |   |     real  Bitrt    nom   real   SampP
|  Bitrate TQ[ns] PrS PhS1 PhS2 SJW BRP  Bitrate  Error  SampP  SampP   Error   df1    df2       NBTCFG
|  1000000     25  14   15   10   5   1  1000000   0.0%  75.0%  75.0%   0.0%   9.8‰   6.2‰   0x001c0904
|   800000     25  19   20   10   5   1   800000   0.0%  80.0%  80.0%   0.0%   7.8‰   5.0‰   0x00260904
|   666666     25  23   24   12   6   1   666666   0.0%  80.0%  80.0%   0.0%   7.8‰   5.0‰   0x002e0b05
|   500000     25  34   35   10   5   1   500000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x00440904
|   250000     25  69   70   20  10   1   250000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x008a1309
|   125000     50  69   70   20  10   2   125000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x018a1309
|   100000     50  87   87   25  12   2   100000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.0‰   0x01ad180b
|    83333     50 104  105   30  15   2    83333   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x01d01d0e
|    50000    100  87   87   25  12   4    50000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.0‰   0x03ad180b
|    33333    125 104  105   30  15   5    33333   0.0%  87.5%  87.5%   0.0%   4.9‰   3.1‰   0x04d01d0e
|    20000    250  87   87   25  12  10    20000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.0‰   0x09ad180b
|    10000    500  87   87   25  12  20    10000   0.0%  87.5%  87.5%   0.0%   4.9‰   3.0‰   0x13ad180b

The "df2" stays at ~3.1‰

[2] http://web.archive.org/https://www.can-cia.org/fileadmin/resources/documents/proceedings/2013_mmc_koppe2.pdf
[3] http://web.archive.org/http://www.oertel-halle.de/files/cia99paper.pdf


And I tested with SJW=max from mcp251x (not the mcp251xfd) with SJA=max
to a peak USB adapter with SJW=1 - the peak adapter fails to receive CAN
frames:

| 5: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10
|     link/can  promiscuity 0 minmtu 0 maxmtu 0
|     can state ERROR-ACTIVE restart-ms 1000
|           bitrate 500000 sample-point 0.850
|           tq 100 prop-seg 8 phase-seg1 8 phase-seg2 3 sjw 3 brp 1
|           mcp251x: tseg1 3..16 tseg2 2..8 sjw 1..4 brp 1..64 brp_inc 1
|           clock 10000000
|           re-started bus-errors arbit-lost error-warn error-pass bus-off
|           0          0          0          2          2          0         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 parentbus spi parentdev spi1.1

| 97: peakfd0: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UP mode DEFAULT group default qlen 10
|     link/can  promiscuity 0 minmtu 0 maxmtu 0
|     can <FD> state ERROR-PASSIVE (berr-counter tx 0 rx 136) restart-ms 1000
|           bitrate 500000 sample-point 0.875
|           tq 12 prop-seg 69 phase-seg1 70 phase-seg2 20 sjw 1 brp 1
|           pcan_usb_fd: tseg1 1..256 tseg2 1..128 sjw 1..128 brp 1..1024 brp_inc 1
|           dbitrate 2000000 dsample-point 0.750
|           dtq 12 dprop-seg 14 dphase-seg1 15 dphase-seg2 10 dsjw 1 dbrp 1
|           pcan_usb_fd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..1024 dbrp_inc 1
|           clock 80000000
|           re-started bus-errors arbit-lost error-warn error-pass bus-off
|           0          0          0          1          2          0         numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 gro_max_size 65536 parentbus usb parentdev 2-2.2:1.0

|  (1970-01-01 02:41:33.856148)  peakfd0  RX - -  20000004   [8]  00 04 00 00 00 00 00 00   ERRORFRAME
|         controller-problem{rx-error-warning}
|  (1970-01-01 02:41:33.857168)  peakfd0  RX - -  20000004   [8]  00 10 00 00 00 00 00 00   ERRORFRAME
|         controller-problem{rx-error-passive}

With SJW = Phase Seg2 / 2, it works.

> In general the recommendation is to chose SJW as big as possible.
> Here's an excerpt from the CiA 601-3 (Unfortunately I can't share the
> entire document)

CiA 301 and J1939 seem to have a different opinion... 

> "Recom2: Choose sjwA as large as possible.
> - The maximum possible value is sjwA = min(ps1A, ps2A).
> - During arbitration phase, a large sjwA allows a CAN node to resynchronize quickly on the leading transmitting node.
> - If the CAN controller does not allow configuring the maximum possible sjwA
> because the value range for sjwA is limited in the CAN controller, then consider
> increasing BRPA. This allows configuring a larger sjwA, unless this can impact
> phase error tolerance in an unwanted way, see Recom4."
>
> I think we discussed this a while ago already and the biggest
> reasoning against maxing SJW was breaking "compatibility" to old
> kernel versions.

ACK

> The compatibility in this is not being able to sync
> though... Now, if a change to a larger default values is agreed upon
> we might as well chose min(phaseseg1, phaseseg2) directly to maximize
> robustness for the user that doesn't set it explicitly. Given that
> most of the getting started with Socketcan results on your favorite
> Search engine don't even mention SJW that's probably a significant
> portion of the users. From a pure communication standpoint I can't see
> any downside to doing this, the only hypothetical case would be if a
> CAN controller had an erratum about higher SJWs and the driver not
> handling it.
>
> So, in essence, I'd propose to use min(p1,p2).

NACK - as I have a setup that doesn't work anymore when mixing an old
and a new kernel.

regards,
Marc

--
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Automotive Discussions]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [CAN Bus]

  Powered by Linux