Re: RE: AW: AW: mcp251xfd: Bad message receiption (1/2)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 10.01.2023 15:53:22, Marc Kleine-Budde wrote:
> On 10.01.2023 12:50:18, Thomas.Kopp@xxxxxxxxxxxxx wrote:
> > > For the last case, I attached debugger after the fail an tried to read from the
> > > socket until read returned -1:
> > >     (gdb) p malloc(72)
> > >     (gdb) p read(sockfd,$1,72)
> > >     (gdb) print fprint_canframe_mtu(stdout, $1, "\n", 4, 72) And surprise
> > > surprise, there were more CAN frames in store. Refer to the log/pdf.
> > 
> > I was finally able to reproduce the issue and capture a full SPI/CAN
> > log when it happens. At this point it looks like the tail pointer of
> > the FIFO in the MCP2518FD gets corrupted and the driver does what it's
> > supposed to do and reads out lots of frames...
> 
> You mean the head pointer?

I've reproduced the issue here, too, on a single-core imx6,
net-next/master from end of last year.

I added some tracing to the driver and recoded the following events.

Let me explain how to read this. The "chip" variants are read from the
chip via SPI. The "ring" variants are the cached values in the driver.

                   chip-tail       chip-delta (head-tail)
                     |     chip-head |
                     |       |       |     ring-tail     ring-head     ring-delta
                     |       |       |       |             |             | (head-tail)
| do_rxif_analyze: ct=0x0b ch=0x0c cd=0x01 rt=0x00000c9b rh=0x00000c9c rd=0x01
|         TFERFFIF                Receive FIFO Full Interrupt Flag 
|         TFHRFHIF                Receive FIFO Half Full Interrupt Flag
|         TFNRFNIF   x            Receive FIFO Not Empty Interrupt Flag
          ^^^^^^^^
These are the decoded lowest 3 bits of the FIFOSTA register.

The different runs of the rxif handler (mcp251xfd_handle_rxif()) are
separated by a ("----") line.


Now the analysis:

| --------------------------------------------------------------------------------
| do_rxif_analyze: ct=0x09 ch=0x0a cd=0x01 rt=0x00000c99 rh=0x00000c9a rd=0x01
|         TFERFFIF                Receive FIFO Full Interrupt Flag 
|         TFHRFHIF                Receive FIFO Half Full Interrupt Flag
|         TFNRFNIF   x            Receive FIFO Not Empty Interrupt Flag
| --------------------------------------------------------------------------------
| do_rxif_analyze: ct=0x0a ch=0x0b cd=0x01 rt=0x00000c9a rh=0x00000c9b rd=0x01
|         TFERFFIF                Receive FIFO Full Interrupt Flag 
|         TFHRFHIF                Receive FIFO Half Full Interrupt Flag
|         TFNRFNIF   x            Receive FIFO Not Empty Interrupt Flag
| --------------------------------------------------------------------------------
| do_rxif_analyze: ct=0x0b ch=0x0c cd=0x01 rt=0x00000c9b rh=0x00000c9c rd=0x01
|         TFERFFIF                Receive FIFO Full Interrupt Flag 
|         TFHRFHIF                Receive FIFO Half Full Interrupt Flag
|         TFNRFNIF   x            Receive FIFO Not Empty Interrupt Flag

So far everything looks good. The chip delta ("cd"), i.e. the CAN frames
waiting to be read is 1. The ring delta is 1, too. Meaning 1 CAN frame
has been read.

| --------------------------------------------------------------------------------
| do_rxif_analyze: ct=0x0c ch=0x0f cd=0x03 rt=0x00000c9c rh=0x00000c9f rd=0x03
|         TFERFFIF                Receive FIFO Full Interrupt Flag 
|         TFHRFHIF                Receive FIFO Half Full Interrupt Flag
|         TFNRFNIF   x            Receive FIFO Not Empty Interrupt Flag

Now 3 CAN frame are waiting to be read. Thomas told me that he sees 3
CAN frames waiting too, before things go downhill.

| --------------------------------------------------------------------------------
| do_rxif_analyze: ct=0x0f ch=0x0e cd=0x0f rt=0x00000c9f rh=0x00000cae rd=0x0f
|         TFERFFIF                Receive FIFO Full Interrupt Flag 
|         TFHRFHIF   x            Receive FIFO Half Full Interrupt Flag
|         TFNRFNIF   x            Receive FIFO Not Empty Interrupt Flag

This is strange: chip head "jumped" to 0x0e, this means there are 0x0f
CAN frames waiting. However the FIFO flags (FIFO Half Full set, FIFO
Full not set) are consistent with 0x0f waiting CAN frames, the FIFO size
if 0x10.

Note: The chip tail "ct=0x0f" is consistent, as the "old" chip head was
      0x0f.

| --------------------------------------------------------------------------------
| do_rxif_analyze: ct=0x0e ch=0x0e cd=0x10 rt=0x00000cae rh=0x00000cbe rd=0x10
|         TFERFFIF   x            Receive FIFO Full Interrupt Flag 
|         TFHRFHIF   x            Receive FIFO Half Full Interrupt Flag
|         TFNRFNIF   x            Receive FIFO Not Empty Interrupt Flag

Chip tail is consistent (0x0e) as it equals the old chip head. chip head
is also 0x0e, this means either an empty or a full FIFO, the flags
indicate a full FIFO.

The following events seem to be OK:

| --------------------------------------------------------------------------------
| do_rxif_analyze: ct=0x0e ch=0x0f cd=0x01 rt=0x00000cbe rh=0x00000cbf rd=0x01
|         TFERFFIF                Receive FIFO Full Interrupt Flag 
|         TFHRFHIF                Receive FIFO Half Full Interrupt Flag
|         TFNRFNIF   x            Receive FIFO Not Empty Interrupt Flag
| --------------------------------------------------------------------------------
| do_rxif_analyze: ct=0x0f ch=0x00 cd=0x01 rt=0x00000cbf rh=0x00000cc0 rd=0x01
|         TFERFFIF                Receive FIFO Full Interrupt Flag 
|         TFHRFHIF                Receive FIFO Half Full Interrupt Flag
|         TFNRFNIF   x            Receive FIFO Not Empty Interrupt Flag
| --------------------------------------------------------------------------------

Final note on the number of send and received CAN frames.

The ring head (in the event above) shows 0x0cc0 received CAN frames.
The sequence number in the register dump shows 0x000c9f, which means a
total of 0x000ca0 CAN frames have been send.

| TX-0 Object: 0x03 (0xaf8)
|               id = 0x000002a5
|            flags = 0x00193e08
|    SEQ_MCP2517FD = 0x00001f             Sequence (MCP2517)
|    SEQ_MCP2518FD = 0x000c9f             Sequence (MCP2518)
|             data = 00 03 69 df  c5 ee 62 77

The BDIAG1 register gives a total of 0x0ca0 error free messages, looks
consistent, too Thomas is this RX'ed or TX'ed error free messages?

| BDIAG1: bdiag1(0x03c)=0x00000ca0
[...]
|         EFMSGCNT = 0x0ca0       Error Free Message Counter

regards,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Automotive Discussions]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [CAN Bus]

  Powered by Linux