Re: m_can error/overrun frames on high speed

Marc Kleine-Budde <mkl@xxxxxxxxxxxxxx> · Thu, 1 Apr 2021 11:23:52 +0200

On 01.04.2021 11:04:25, Belisko Marek wrote:
> > As far as I know the beagle bone boards all have d_can controllers, not
> > m_can.
> Yes sorry it was typo.

No problem, just wanted to be sure :)

> > > I discovered that when set bitrate to 500k during replaying can file
> > > from PC to board ip detect 4-5 error/overrun frames. When comparing
> > > the original file with received one few lines in candump are missing.
> > > When decreased can speed to 125KB replaying the same file no
> > > error/overruns are detected and files are the same. I'm not can expert
> > > thus I'm asking for some advice on how to debug such phenomena. I'm
> > > using mainline 4.12 kernel which shows this symptom. I compared
> > > changes with the latest mainline kernel and there are few patches only
> > > which seems can influence can behavior (others are only cosmetical). I
> > > took :
> > >
> > > 3cb3eaac52c0f145d895f4b6c22834d5f02b8569 - can: c_can: c_can_poll():
> > > only read status register after status IRQ
> > > 23c5a9488f076bab336177cd1d1a366bd8ddf087 - can: c_can: D_CAN:
> > > c_can_chip_config(): perform a sofware reset on open
> > > 6f12001ad5e79d0a0b08c599731d45c34cafd376 - can: c_can: C_CAN: add bus
> > > recovery events
> > >
> > > I know most of the answers for such issues is to try latest kernel
> > > (i'm in process trying 5.10).
> >
> > That's going into the right direction. Please try the lastest
> > net-next/master, which includes this merge:
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=9c0ee085c49c11381dcbd609ea85e902eab88a92

> I tried to build this kernel and when run on my target and run on
> other side cangen can0 -g0 (at 500kb bitrate) after some time I see on
> receiving side:

Does the current net-next lead to fewer lost frames than your original
kernel? I mean does it make the situation better?

> 3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP
> mode DEFAULT group default qlen 10
>     link/can  promiscuity 0
>     can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
>           bitrate 500000 sample-point 0.875
>           tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
>           c_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
>           clock 24000000
>           re-started bus-errors arbit-lost error-warn error-pass bus-off
>           0          0          0          0          0          0
>     numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
>     RX: bytes  packets  errors  dropped overrun mcast
>     6300263    999976   4       0       4       0
>     TX: bytes  packets  errors  dropped carrier collsns
>     0          0        0       0       0       0
> 
> errors/overrun frames. My theory is that before napi handling of
> received data we disable interrupts and when we process received
> messages and re-enable irq again we can see overrun because reading of
> data can be slow.

Yes, I assume the same problem.

> Is there anything I can tune to have it read faster? Thanks.

I don't think it can be done with tuning. To work around this problem,
you can convert the c_can driver to the rx-offload infrastructure. You
do the RX from the CAN HW in the IRQ handler, but pass it to the
networking stack in NAPI. This dance is needed, as otherwise the
networking stack messes up the order of received CAN frames.

There even is an old branch that implemented that, but was never merged:

https://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next.git/log/?h=c_can

Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |
Attachment:
signature.asc

Description: PGP signature