Re: m_can error/overrun frames on high speed

Belisko Marek <marek.belisko@xxxxxxxxx> · Thu, 1 Apr 2021 11:04:25 +0200



Hi,
On Wed, Mar 31, 2021 at 10:37 AM Marc Kleine-Budde <mkl@xxxxxxxxxxxxxx> wrote:
>
> On 28.03.2021 08:31:14, Belisko Marek wrote:
> > I have a beaglebone based board and I'm performing some tests.
>
> As far as I know the beagle bone boards all have d_can controllers, not
> m_can.
Yes sorry it was typo.
>
> > I discovered that when set bitrate to 500k during replaying can file
> > from PC to board ip detect 4-5 error/overrun frames. When comparing
> > the original file with received one few lines in candump are missing.
> > When decreased can speed to 125KB replaying the same file no
> > error/overruns are detected and files are the same. I'm not can expert
> > thus I'm asking for some advice on how to debug such phenomena. I'm
> > using mainline 4.12 kernel which shows this symptom. I compared
> > changes with the latest mainline kernel and there are few patches only
> > which seems can influence can behavior (others are only cosmetical). I
> > took :
> >
> > 3cb3eaac52c0f145d895f4b6c22834d5f02b8569 - can: c_can: c_can_poll():
> > only read status register after status IRQ
> > 23c5a9488f076bab336177cd1d1a366bd8ddf087 - can: c_can: D_CAN:
> > c_can_chip_config(): perform a sofware reset on open
> > 6f12001ad5e79d0a0b08c599731d45c34cafd376 - can: c_can: C_CAN: add bus
> > recovery events
> >
> > I know most of the answers for such issues is to try latest kernel
> > (i'm in process trying 5.10).
>
> That's going into the right direction. Please try the lastest
> net-next/master, which includes this merge:
>
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=9c0ee085c49c11381dcbd609ea85e902eab88a92
I tried to build this kernel and when run on my target and run on
other side cangen can0 -g0 (at 500kb bitrate) after some time I see on
receiving side:
3: can0: <NOARP,UP,LOWER_UP,ECHO> mtu 16 qdisc pfifo_fast state UP
mode DEFAULT group default qlen 10
    link/can  promiscuity 0
    can state ERROR-ACTIVE (berr-counter tx 0 rx 0) restart-ms 0
          bitrate 500000 sample-point 0.875
          tq 125 prop-seg 6 phase-seg1 7 phase-seg2 2 sjw 1
          c_can: tseg1 2..16 tseg2 1..8 sjw 1..4 brp 1..1024 brp-inc 1
          clock 24000000
          re-started bus-errors arbit-lost error-warn error-pass bus-off
          0          0          0          0          0          0
    numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
    RX: bytes  packets  errors  dropped overrun mcast
    6300263    999976   4       0       4       0
    TX: bytes  packets  errors  dropped carrier collsns
    0          0        0       0       0       0

errors/overrun frames. My theory is that before napi handling of
received data we disable interrupts and when we process received
messages and re-enable irq again we can see overrun because reading of
data can be slow.
Is there anything I can tune to have it read faster? Thanks.
>
> regards,
> Marc
>
> --
> Pengutronix e.K.                 | Marc Kleine-Budde           |
> Embedded Linux                   | https://www.pengutronix.de  |
> Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
> Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

BR,

marek