Re: Can Frame submission and dropped frames

Wolfgang Grandegger <wg@xxxxxxxxxxxxxx> · Wed, 28 Nov 2018 21:55:05 +0100

Hello,

Am 28.11.2018 um 19:39 schrieb kernel@xxxxxxxxxxxxxxxx:
> Hi!
> 
> While working on the rewrite of the mcp25xxfd driver to get upstreamed I have
> come across a strange observation with regards to dropped frames:
> 
> Essentially I am running a worsted case CAN2.0 bus saturation test where
> I receive 1M CAN2.0 frames (standard ID, Len: 0) at 1MHz CAN-bus-rate
> in 57s (= 17000 frames/s).

Do you also get that many interrupts, or even more (for SPI as well)?

> On a Raspberry Pi3 I can handle this load from the SPI side without any issues
> or lost packages (even though the driver is still unoptimized and I made the
> decision to have those optimizations submitted as separate patches on top of 
> basic functionality).
> 
> This means with the following code disabled:
> 	skb = alloc_can_skb(net, &frame);
> 	if (!skb)
> 		return NULL;
> 	frame->can_id = id;
> 	frame->can_dlc = dlc;
> 	memcpy(frame->data, rx->data, len);
> 	netif_rx_ni(skb);
> 
> (Counters are updated before this code is executed)
> 
> But when I enable submission of the frames to the network stack I get lots 
> of dropped packets and the CPU load is increased and also see packet loss
> on the SPI side due to CPU congestion.
> 
> Here stats after 1M packets received without submission to the stack:
> root@raspcm3:~# ip -d -s link show  can0
> 11: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10
>     link/can  promiscuity 0
>     can <FD> state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
> 	  bitrate 1000000 sample-point 0.750
> 	  tq 25 prop-seg 14 phase-seg1 15 phase-seg2 10 sjw 1
> 	  mcp25xxfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1
> 	  dbitrate 1000000 dsample-point 0.750
> 	  dtq 25 dprop-seg 14 dphase-seg1 15 dphase-seg2 10 dsjw 1
> 	  mcp25xxfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1
> 	  clock 40000000
> 	  re-started bus-errors arbit-lost error-warn error-pass bus-off
> 	  0          0          0          0          0          0
>           numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
>     RX: bytes  packets  errors  dropped overrun mcast
>     0          1000000  0       0       0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     0          0        0       0       0       0
> 
> 
> And after a module reload now with packet submission code enabled 
> (just a module parameter changed):
> root@raspcm3:~# ip -d -s link show  can0
> 12: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10
>     link/can  promiscuity 0
>     can <FD> state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
> 	  bitrate 1000000 sample-point 0.750
> 	  tq 25 prop-seg 14 phase-seg1 15 phase-seg2 10 sjw 1
> 	  mcp25xxfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1
> 	  dbitrate 1000000 dsample-point 0.750
> 	  dtq 25 dprop-seg 14 dphase-seg1 15 dphase-seg2 10 dsjw 1
> 	  mcp25xxfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1
> 	  clock 40000000
> 	  re-started bus-errors arbit-lost error-warn error-pass bus-off
> 	  0          0          0          0          0          0
>           numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
>     RX: bytes  packets  errors  dropped overrun mcast
>     0          1000000  0       945334  0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     0          0        0       0       0       0

What CPU load do you see, e.g. with "htop -d50"?
Also compared with the case above?
Did you try to increase the priority of the service thread (with chrt)?

The symptoms are similar to what we get with bus-errors starving the CPU
on some low-end systems (where we cannot disable the creation of
bus-errors, e.g. if no cable is connected).

> A more realistic scenario would be with DLC=8, and looks like this:
> (this took 122.3s):
> root@raspcm3:~# ip -d -s link show  can0
> 13: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10
>     link/can  promiscuity 0
>     can <FD> state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
> 	  bitrate 1000000 sample-point 0.750
> 	  tq 25 prop-seg 14 phase-seg1 15 phase-seg2 10 sjw 1
> 	  mcp25xxfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1
> 	  dbitrate 1000000 dsample-point 0.750
> 	  dtq 25 dprop-seg 14 dphase-seg1 15 dphase-seg2 10 dsjw 1
> 	  mcp25xxfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1
> 	  clock 40000000
> 	  re-started bus-errors arbit-lost error-warn error-pass bus-off
> 	  0          0          0          0          0          0
>           numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
>     RX: bytes  packets  errors  dropped overrun mcast
>     8000000    1000000  0       0       0       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     0          0        0       0       0       0

This means the problem does not show up with a payload of 8 bytes!?

> So I am wondering: is there a good idea already how this worsted case 
> issue can get avoided in the first place?
> 
> What I could come up with is the idea of:
> * queuing packets in a ring buffer of a certain size
> * having a separate submission thread that pushes messages onto the
>   network stack (essentially the short code above)
> 
> The idea is that this thread would (hopefully) get scheduled on a different
> core so that the CPU resources would get better used.

Well, not sure if that makes it better. You have to send the queued
packages in order to the network stack. Either you queue all message or
just the small one till a bigger one arrives. Anyway, this optimization
is very special.

> Logic to switch from inline to deferred queuing could be made dynamically
> based on traffic (i.e: if there is more than one FIFO filled on the
> controller or there is already something in the queue then defer 
> submission to that separate thread)
> 
> Obviously this leads to delays in submission but at least for medium 
> length bursts of messages no message is getting lost dropped or ...
> 
> Is this something the driver should address (as a separate patch)?

Imaging you have 2 or even 4 CAN controllers on the board. Or a system
with just one CPU.

> Or should there be something in the can framework/stack that could
> handle such situations better?

That would be ideal, but I'm not sure if it's feasible. Already the high
interrupt load is a problem. And SPI likely as well. And I think we
suffer here from the overhead of the networking stack as well.

> Or should I just ignore those “dropped” packages, as this is really
> a worsted case scenario?

If we can do it better with reasonable effort and in a portable way, we
should address it, of course. Would be interesting to know how other CAN
controllers behave on that hardware.

Wolfgang.