Hello, Am 29.11.2018 um 16:57 schrieb kernel@xxxxxxxxxxxxxxxx: > Hi Wolfgang! >> On 28.11.2018, at 21:55, Wolfgang Grandegger <wg@xxxxxxxxxxxxxx> wrote: >> >> Hello, >> >> Am 28.11.2018 um 19:39 schrieb kernel@xxxxxxxxxxxxxxxx: >>> Hi! >>> >>> While working on the rewrite of the mcp25xxfd driver to get upstreamed I have >>> come across a strange observation with regards to dropped frames: >>> >>> Essentially I am running a worsted case CAN2.0 bus saturation test where >>> I receive 1M CAN2.0 frames (standard ID, Len: 0) at 1MHz CAN-bus-rate >>> in 57s (= 17000 frames/s). >> >> Do you also get that many interrupts, or even more (for SPI as well)? > > No - for this specific test the system is heavily loaded doing SPI transfers > so the MCP2517FD CAN line (almost) never goes high during the test. > > On top the SPI driver on the bcm283x SOC is heavily optimized (mostly for > short transfers to improve performance of the mcp2515 ) and it will > run polling IO for transfers that take a short time (<30us) and only then defer > to interrupts - this mostly to avoid latencies with irq-thread wakeup-scheduling. > > As the SPI speed is quite high (in my case 12.5MHz) you can transfer the 22 > bytes (to read the content of a fifo) in 16us - which is acceptable for > polling (with CS select/deselect this is about 22us). > > So my expectation is that for most practical purposes there is no irq on the > SPI side. > On the other hand I am seeing interrupts in the order of 39k plus 62k CS per > second even though there is no load on the system - but none of those > interrupts are SPI or mcp2517 related. You mean "cat /proc/interrupts" reports 100000 interrupts per second, one every 10 micro-second!? > Some of the context switching is strange as both spi0 and the irq-mcp25xxfd > kernel thread are consuming CPU - but I guess that is an issue with SPI I need > to look into separately. > >> >>> On a Raspberry Pi3 I can handle this load from the SPI side without any issues >>> or lost packages (even though the driver is still unoptimized and I made the >>> decision to have those optimizations submitted as separate patches on top of >>> basic functionality). >>> >>> This means with the following code disabled: >>> skb = alloc_can_skb(net, &frame); >>> if (!skb) >>> return NULL; >>> frame->can_id = id; >>> frame->can_dlc = dlc; >>> memcpy(frame->data, rx->data, len); >>> netif_rx_ni(skb); >>> >>> (Counters are updated before this code is executed) >>> >>> But when I enable submission of the frames to the network stack I get lots >>> of dropped packets and the CPU load is increased and also see packet loss >>> on the SPI side due to CPU congestion. >>> >>> Here stats after 1M packets received without submission to the stack: >>> root@raspcm3:~# ip -d -s link show can0 >>> 11: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10 >>> link/can promiscuity 0 >>> can <FD> state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0 >>> bitrate 1000000 sample-point 0.750 >>> tq 25 prop-seg 14 phase-seg1 15 phase-seg2 10 sjw 1 >>> mcp25xxfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1 >>> dbitrate 1000000 dsample-point 0.750 >>> dtq 25 dprop-seg 14 dphase-seg1 15 dphase-seg2 10 dsjw 1 >>> mcp25xxfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1 >>> clock 40000000 >>> re-started bus-errors arbit-lost error-warn error-pass bus-off >>> 0 0 0 0 0 0 >>> numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 >>> RX: bytes packets errors dropped overrun mcast >>> 0 1000000 0 0 0 0 >>> TX: bytes packets errors dropped carrier collsns >>> 0 0 0 0 0 0 >>> >>> >>> And after a module reload now with packet submission code enabled >>> (just a module parameter changed): >>> root@raspcm3:~# ip -d -s link show can0 >>> 12: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10 >>> link/can promiscuity 0 >>> can <FD> state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0 >>> bitrate 1000000 sample-point 0.750 >>> tq 25 prop-seg 14 phase-seg1 15 phase-seg2 10 sjw 1 >>> mcp25xxfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1 >>> dbitrate 1000000 dsample-point 0.750 >>> dtq 25 dprop-seg 14 dphase-seg1 15 dphase-seg2 10 dsjw 1 >>> mcp25xxfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1 >>> clock 40000000 >>> re-started bus-errors arbit-lost error-warn error-pass bus-off >>> 0 0 0 0 0 0 >>> numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 >>> RX: bytes packets errors dropped overrun mcast >>> 0 1000000 0 945334 0 0 >>> TX: bytes packets errors dropped carrier collsns >>> 0 0 0 0 0 0 >> >> What CPU load do you see, e.g. with "htop -d50”? > > htop -d 50 shows: > * with submission: > * 1 core 100% ( > * 1 core 25% > * without submission: > * same measurements > > this sounds strange... Then it's no surprise that most of the packets are dropped. >> Did you try to increase the priority of the service thread (with chrt)? > no - I am getting 100% already. > > >> >> The symptoms are similar to what we get with bus-errors starving the CPU >> on some low-end systems (where we cannot disable the creation of- >> bus-errors, e.g. if no cable is connected). > > I can not comment on that... It's a different issue then. >> >>> A more realistic scenario would be with DLC=8, and looks like this: >>> (this took 122.3s): >>> root@raspcm3:~# ip -d -s link show can0 >>> 13: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 72 qdisc pfifo_fast state UNKNOWN mode DEFAULT group default qlen 10 >>> link/can promiscuity 0 >>> can <FD> state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0 >>> bitrate 1000000 sample-point 0.750 >>> tq 25 prop-seg 14 phase-seg1 15 phase-seg2 10 sjw 1 >>> mcp25xxfd: tseg1 2..256 tseg2 1..128 sjw 1..128 brp 1..256 brp-inc 1 >>> dbitrate 1000000 dsample-point 0.750 >>> dtq 25 dprop-seg 14 dphase-seg1 15 dphase-seg2 10 dsjw 1 >>> mcp25xxfd: dtseg1 1..32 dtseg2 1..16 dsjw 1..16 dbrp 1..256 dbrp-inc 1 >>> clock 40000000 >>> re-started bus-errors arbit-lost error-warn error-pass bus-off >>> 0 0 0 0 0 0 >>> numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535 >>> RX: bytes packets errors dropped overrun mcast >>> 8000000 1000000 0 0 0 0 >>> TX: bytes packets errors dropped carrier collsns >>> 0 0 0 0 0 0 >> >> This means the problem does not show up with a payload of 8 bytes!? > For all practical purposes I have not seen any drops in this case. > But in this case I see interrupts on the int-mcp251xfd side (498k for 1M > CAN frames). 498000 interrupts per second, really? Gosh! >> >>> So I am wondering: is there a good idea already how this worsted case >>> issue can get avoided in the first place? >>> >>> What I could come up with is the idea of: >>> * queuing packets in a ring buffer of a certain size >>> * having a separate submission thread that pushes messages onto the >>> network stack (essentially the short code above) >>> >>> The idea is that this thread would (hopefully) get scheduled on a different >>> core so that the CPU resources would get better used. >> >> Well, not sure if that makes it better. You have to send the queued >> packages in order to the network stack. Either you queue all message or >> just the small one till a bigger one arrives. Anyway, this optimization >> is very special. > > I agree - that was why I was asking… > >> >>> Logic to switch from inline to deferred queuing could be made dynamically >>> based on traffic (i.e: if there is more than one FIFO filled on the >>> controller or there is already something in the queue then defer >>> submission to that separate thread) >>> >>> Obviously this leads to delays in submission but at least for medium >>> length bursts of messages no message is getting lost dropped or ... >>> >>> Is this something the driver should address (as a separate patch)? >> >> Imaging you have 2 or even 4 CAN controllers on the board. Or a system >> with just one CPU. > > As this is very specific to the high load test. > > I doubt that a second controller can get used under these circumstances. Me too! > Reason is that basically statistics show that for the above > performance test: > * the SPI clock is asserted (high) for about 25% of the time - hence > clock is running for about 50% of the time - that is at > (effectively) - 9 cycles per byte where the last cycle is low, > so the actual figure is actually 12% higher... > * SPI-CS CS is asserted (low) for about 75% of the time. > * on top there is the spi framework overhead that is not easily > measurable but the time between CS-deassert and CS-assert for > 2 back to back spi_sync calls is 4-6us (with minimal error > handling) - this is the penalty of the SPI-framework. > Taking this into account then the 75% raise to 99.7% utilization > (=75% + (5us*49400)/1s*100%) usage of SPI (at 49.4k CS-assers/s) > > This does not leave much space for more SPI traffic to handle high load. > > At 500kHz CAN-Bus rate it should be possible - or at higher > SPI clock rates (I am using 12.5 to be able to have a chance to > capture SPI traffic at 50MHz with my logic analyzer with a sufficiently > high sample rate). > > So there is “space” there as well when interfacing at 20MHz. > > A second controller may be possible for all but this torture tests. > CanFD with 8MHz Data rates and DLC=9 (12 bytes) may be another such > situation where this may break - the driver reads 8 data bytes > when reading the header - everything above as a separate spi_sync > call (the driver may become more dynamic based on previous packets > in the future, but mis-predictions of DLC=15 while effective DLC=8 > are bad as it increases the number of bytes to get transferred over > the spi bus by a factor of 3.5 (22 vs 78 bytes)! > > But that will be another patch. Well, SPI is a pain (for software). >>> Or should there be something in the can framework/stack that could >>> handle such situations better? >> >> That would be ideal, but I'm not sure if it's feasible. Already the high >> interrupt load is a problem. And SPI likely as well. And I think we >> suffer here from the overhead of the networking stack as well. >>> Or should I just ignore those “dropped” packages, as this is really >>> a worsted case scenario? >> >> If we can do it better with reasonable effort and in a portable way, we >> should address it, of course. Would be interesting to know how other CAN >> controllers behave on that hardware. > > In the future maybe - I just wanted this fact to be on the radar/known... Keep it simple! I'm not sure if this worst case scenario is a realistic one as all other nodes on the CAN bus will moan as well. Do you have a chance to do such test with a Flexcan or SJA1000? Wolfgang.