Hi! While working on the rewrite of the mcp25xxfd driver to get upstreamed I have come across a strange observation with regards to dropped frames: Essentially I am running a worsted case CAN2.0 bus saturation test where I receive 1M CAN2.0 frames (standard ID, Len: 0) at 1MHz CAN-bus-rate in 57s (= 17000 frames/s). On a Raspberry Pi3 I can handle this load from the SPI side without any issues or lost packages (even though the driver is still unoptimized and I made the decision to have those optimizations submitted as separate patches on top of basic functionality). This means with the following code disabled: skb = alloc_can_skb(net, &frame); if (!skb) return NULL; frame->can_id = id; frame->can_dlc = dlc; memcpy(frame->data, rx->data, len); netif_rx_ni(skb); (Counters are updated before this code is executed) But when I enable submission of the frames to the network stack I get lots of dropped packets and the CPU load is increased and also see packet loss on the SPI side due to CPU congestion. Here stats after 1M packets received without submission to the stack: root@raspcm3:~# ip -d -s link show can0 11: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10 link/can promiscuity 0 can <FD> state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0 bitrate 1000000 sample-point 0.750 tq 25 prop-seg 14 phase-seg1 15 phase-seg2 10 sjw 1 mcp25xxfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1 dbitrate 1000000 dsample-point 0.750 dtq 25 dprop-seg 14 dphase-seg1 15 dphase-seg2 10 dsjw 1 mcp25xxfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1 clock 40000000 re-started bus-errors arbit-lost error-warn error-pass bus-off 0 0 0 0 0 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 RX: bytes packets errors dropped overrun mcast 0 1000000 0 0 0 0 TX: bytes packets errors dropped carrier collsns 0 0 0 0 0 0 And after a module reload now with packet submission code enabled (just a module parameter changed): root@raspcm3:~# ip -d -s link show can0 12: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10 link/can promiscuity 0 can <FD> state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0 bitrate 1000000 sample-point 0.750 tq 25 prop-seg 14 phase-seg1 15 phase-seg2 10 sjw 1 mcp25xxfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1 dbitrate 1000000 dsample-point 0.750 dtq 25 dprop-seg 14 dphase-seg1 15 dphase-seg2 10 dsjw 1 mcp25xxfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1 clock 40000000 re-started bus-errors arbit-lost error-warn error-pass bus-off 0 0 0 0 0 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 RX: bytes packets errors dropped overrun mcast 0 1000000 0 945334 0 0 TX: bytes packets errors dropped carrier collsns 0 0 0 0 0 0 A more realistic scenario would be with DLC=8, and looks like this: (this took 122.3s): root@raspcm3:~# ip -d -s link show can0 13: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10 link/can promiscuity 0 can <FD> state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0 bitrate 1000000 sample-point 0.750 tq 25 prop-seg 14 phase-seg1 15 phase-seg2 10 sjw 1 mcp25xxfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1 dbitrate 1000000 dsample-point 0.750 dtq 25 dprop-seg 14 dphase-seg1 15 dphase-seg2 10 dsjw 1 mcp25xxfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1 clock 40000000 re-started bus-errors arbit-lost error-warn error-pass bus-off 0 0 0 0 0 0 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 RX: bytes packets errors dropped overrun mcast 8000000 1000000 0 0 0 0 TX: bytes packets errors dropped carrier collsns 0 0 0 0 0 0 So I am wondering: is there a good idea already how this worsted case issue can get avoided in the first place? What I could come up with is the idea of: * queuing packets in a ring buffer of a certain size * having a separate submission thread that pushes messages onto the network stack (essentially the short code above) The idea is that this thread would (hopefully) get scheduled on a different core so that the CPU resources would get better used. Logic to switch from inline to deferred queuing could be made dynamically based on traffic (i.e: if there is more than one FIFO filled on the controller or there is already something in the queue then defer submission to that separate thread) Obviously this leads to delays in submission but at least for medium length bursts of messages no message is getting lost dropped or ... Is this something the driver should address (as a separate patch)? Or should there be something in the can framework/stack that could handle such situations better? Or should I just ignore those “dropped” packages, as this is really a worsted case scenario? Thanks, Martin P.s: Note: I am running in mixed CanFD mode, but can2.0 messages get submitted as Can frames not CanFD frames in case that the CANFD flag is not set for the message by the controller.