Hi Oliver! > On 25.11.2017, at 13:03, Oliver Hartkopp <socketcan@xxxxxxxxxxxx> wrote: > > Hello Martin, > > thanks for the contribution! > > Unfortunately [PATCH 2/2] only hit the devicetree mailing list but not the linux-can mailing list for the review. I have no idea why you have not received it, but I have sent all of them to the same Recipient list! I even got a relayed confirmation for all 3 of them: 19:35:28.689 2 DEQUEUER [7520159] SMTP(vger.kernel.org)linux-can@xxxxxxxxxxxxxxx relayed: relayed via vger.kernel.org 19:35:29.823 2 DEQUEUER [7520154] SMTP(vger.kernel.org)linux-can@xxxxxxxxxxxxxxx relayed: relayed via vger.kernel.org 19:35:31.902 2 DEQUEUER [7520160] SMTP(vger.kernel.org)linux-can@xxxxxxxxxxxxxxx relayed: relayed via vger.kernel.org So they should all have arrived! > > Btw. I already have two questions from the description: > > On 11/24/2017 07:35 PM, kernel@xxxxxxxxxxxxxxxx wrote: > > (..) > >> The driver has been heavily optimized so that it can handle >> a 100% utilized 1MHz Can-bus (with 11 bit can frames with DLC=0) >> even on less powerfull SOCs like the raspberry pi 1 without >> dropping frames due to driver/spi latencies > > (..) > > > The driver implements a lock-less design for transmissions > > making use instead of prepared spi messages submitted via spi_async > > for transmission in the start_xmit_start code without requireing > > an extra workqueue and the corresponding latencies. > > Seems you improved the SPI handling here. Would it make sense to separate the SPI-related part of the code to a separate C-file so that the existing mcp251x driver can benefit from these improvements too? The biggest problem is that the Register-design of the mcp2517fd is totally different compared to mcp251x, so most of those “optimizations” for the mcp2517 do not apply to the mcp251x (SPI-API wise IMO the mcp2515 is much more efficient - better still when having dedicated gpios for the RX lines to avoid having to query the status first via spi (reduces latencies and thus reduces packet loss/overflow on a saturated CAN bus). I have a spi_async only version of the mcp251x as well (which has been dormant for a long time, as the mcp251x stopped showing interrupt stopped issues...). > >> (still dropps are observed in the can/network stack). > > Are you sure drops are taking place in the network layer? Can you give me some more details about this statement? Well - when I increase net->stat.rx_dropped there is always a dev_warn_ratelimited that comes along. Also rx_errors is increased at the same time. And I see the drops counters increase while there are no messages in dmesg. Also note that the driver is only incrementing rx_packets and rx_bytes only when netif_rx_ni(skb) is called. So the setup is this: * raspberry pi 2 with the mcp2517fd * beagle bone black * 1MHz CAN2.0 bus Beagle bone black: root@beaglebone:~# cangen can0 -g0.0 -L0 -n 1000000 -p10 -Ii This saturates the CAN BUS to 100% on the RPI2 I have no consumer (candump or similar) and the module has just been reloaded (counters are reset). I see the following counters: 7: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10 link/can promiscuity 0 can <ONE-SHOT,BERR-REPORTING> state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0 bitrate 1000000 sample-point 0.750 tq 25 prop-seg 14 phase-seg1 15 phase-seg2 10 sjw 1 mcp2517fd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1 mcp2517fd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1 clock 40000000 re-started bus-errors arbit-lost error-warn error-pass bus-off 0 0 0 0 0 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 RX: bytes packets errors dropped overrun mcast 0 998774 703 4588 0 0 TX: bytes packets errors dropped carrier collsns 0 0 0 0 0 0 So you see that packets + dropped > transmitted (1000000). In a different setup with 4 bytes of data payload: root@beaglebone:~# cangen can0 -g0.0 -L0 -n 10000 -L4 -p10 -Ii root@beaglebone:~# cangen can0 -g0.0 -L0 -n 100000 -L4 -p10 -Ii I get the following statistics: root@rasp2a:~# ip -details -statistics link show can0 8: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10 link/can promiscuity 0 can <ONE-SHOT,BERR-REPORTING> state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0 bitrate 1000000 sample-point 0.750 tq 25 prop-seg 14 phase-seg1 15 phase-seg2 10 sjw 1 mcp2517fd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1 mcp2517fd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1 clock 40000000 re-started bus-errors arbit-lost error-warn error-pass bus-off 0 0 0 0 0 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 RX: bytes packets errors dropped overrun mcast 440000 110000 0 4343 0 0 TX: bytes packets errors dropped carrier collsns 0 0 0 0 0 0 Here 1 get 100% reception rates (bytes and packets are exactly what they are supposed to be) but still 0.1% dropped. When CAN FD is configured on the mcp2517 then things look different even on a RPI2: root@beaglebone:~# cangen can0 -g0.0 -L0 -n 1000000 -L0 -p10 -Ii root@rasp2a:~# ip -details -statistics link show can0 13: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10 link/can promiscuity 0 can <ONE-SHOT,BERR-REPORTING,FD> state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0 bitrate 1000000 sample-point 0.750 tq 25 prop-seg 14 phase-seg1 15 phase-seg2 10 sjw 1 mcp2517fd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1 dbitrate 2000000 dsample-point 0.750 dtq 25 dprop-seg 7 dphase-seg1 7 dphase-seg2 5 dsjw 1 mcp2517fd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1 clock 40000000 re-started bus-errors arbit-lost error-warn error-pass bus-off 0 0 0 0 0 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 RX: bytes packets errors dropped overrun mcast 0 999840 3 752028 0 0 TX: bytes packets errors dropped carrier collsns 0 0 0 0 0 0 In this case the number of packets actually lost on the controller is 0.01% But the number of dropped packets on the stack is 75%. The shorter the interval the lower the dropped count. Ciao, Martin P.s: For the last test-case here the fifo utilization statistics: root@rasp2a:~# cat /sys/kernel/debug/mcp2517fd-can0/fifo_usage/* | awk '{ C=C+$1; printf "%8i %s\n",C,$0}’ Total Count/fifo 384401 384401 674306 289905 960133 285827 990117 29984 994351 4234 996745 2394 997668 923 998217 549 998593 376 998849 256 999059 210 999240 181 999385 145 999508 123 999604 96 999680 76 999745 65 999797 52 999840 43 root@rasp2a:~# cat /sys/kernel/debug/mcp2517fd-can0/rx/fifo_count 19 root@rasp2a:~# dmesg [77843.229423] mcp2517fd spi0.0: RX MAB overflow [77856.329824] mcp2517fd spi0.0: RX MAB overflow [77883.299221] mcp2517fd spi0.0: RX MAB overflow These dmesg lines correspond with the errors (and would also increase “dropped” by 3) -- To unsubscribe from this list: send the line "unsubscribe devicetree" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html