AW: AW: AW: mcp251xfd: Bad message receiption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> Would be interesting if you only attache 1 mcp2518fd to the board and then re-run the test.

The CM4 is the "raspberry pi 4 _compute_ module". So same "hardware" as the standard PI4 but with only one
interface attached (on a professional PCB). Error also happens there.

> You said you made some modifications to the kernel, also it would be good to use the _extact_ version you using to reproduce the error.

Does this answer this question?:
      root@raspberrypi:~/linux-6.0# git remote -v
      origin  https://github.com/raspberrypi/linux (fetch)
      origin  https://github.com/raspberrypi/linux (push)
      root@raspberrypi:~/linux-6.0# git status
      On branch rpi-6.0.y
      Your branch is up to date with 'origin/rpi-6.0.y'.
      
      Changes not staged for commit:
        (use "git add <file>..." to update what will be committed)
        (use "git restore <file>..." to discard changes in working directory)
              modified:   drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
      
      no changes added to commit (use "git add" and/or "git commit -a")
      root@raspberrypi:~/linux-6.0# git diff
      diff --git a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
      index 68df6d464..5eab9dd86 100644
      --- a/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
      +++ b/drivers/net/can/spi/mcp251xfd/mcp251xfd-core.c
      @@ -648,7 +648,7 @@ static u8 mcp251xfd_get_normal_mode(const struct mcp251xfd_priv *priv)
              u8 mode;
      
              if (priv->can.ctrlmode & CAN_CTRLMODE_LOOPBACK)
      -               mode = MCP251XFD_REG_CON_MODE_INT_LOOPBACK;
      +               mode = MCP251XFD_REG_CON_MODE_EXT_LOOPBACK;
              else if (priv->can.ctrlmode & CAN_CTRLMODE_LISTENONLY)
                      mode = MCP251XFD_REG_CON_MODE_LISTENONLY;
              else if (priv->can.ctrlmode & CAN_CTRLMODE_FD)

Regarding my test-program: I used to cross compile it. I moved it to the PI and compiled it natively with the Makefile.standalone
that I send to you. Surprisingly I found only 1 of my 4 instances failed after 72 hours. When I added -g -O2 flags, the natively
build programs failed 3 out of 4 within 26 hours (similar to the cross-build binaries: gcc-6.4 vs gcc-10.2).

> Can you run this in parallel to the test. Abort with Ctrl+c after the test fails and send me the log file. The last 256 lines should be enough.

Sure, but logging for 24h will be to much for my disk space. I found no utility to handle this, so I quick-hacked rbuflog
https://gist.github.com/stefanalt/dd4e68490d1a4e2a343b0beaa1b0d230. The result was so bizarre, that I finally
also added at normal log which I continuously shrinked by running the following command in parallel:

    candump -D can0,0:0,#FFFFFFFF | tee candump0_tee | rbuflog -n 200 -d 10000 candump0
    loopit -d 10 bash -c 'if [ $( du -k candump0_tee | cut -f1 ) -gt 7000 ] ; then cat /dev/null > candump0_tee ; fi'

After the fail I saved all logs/registers and sent a test message through the interface (cansend can0 555#55555555). After
that I re-saved everything and attached it as zip. 

The funny part is marked below. Some messages appear 4 times, where all other message appear only twice.
I have seen this happening several times.

  can0  2A5  [16]  00 02 67 A6 C8 CE 0D 37 F9 63 56 62 F6 2B E4 02
  can0  2A5  [16]  00 02 67 A6 C8 CE 0D 37 F9 63 56 62 F6 2B E4 02
  can0  2A5  [16]  00 03 C5 50 54 BE E1 16 7A 3F 70 B8 EE 5A 09 67
  can0  2A5  [16]  00 03 C5 50 54 BE E1 16 7A 3F 70 B8 EE 5A 09 67
  can0  2A5  [16]  00 00 C1 CA 4C 28 70 15 F6 7E 4C EF E1 A2 52 D8     **
  can0  2A5  [16]  00 00 C1 CA 4C 28 70 15 F6 7E 4C EF E1 A2 52 D8     **
  can0  2A5  [16]  00 00 C1 CA 4C 28 70 15 F6 7E 4C EF E1 A2 52 D8     **
  can0  2A5  [16]  00 00 C1 CA 4C 28 70 15 F6 7E 4C EF E1 A2 52 D8     **
  can0  2A5  [16]  00 01 CD 36 DA 92 86 2E 50 67 44 CB A6 B4 83 94   **
  can0  2A5  [16]  00 01 CD 36 DA 92 86 2E 50 67 44 CB A6 B4 83 94   **
  can0  2A5  [16]  00 01 CD 36 DA 92 86 2E 50 67 44 CB A6 B4 83 94   **
  can0  2A5  [16]  00 01 CD 36 DA 92 86 2E 50 67 44 CB A6 B4 83 94   **
  can0  2A5  [16]  00 02 0F 8D FC D0 57 48 F9 C8 5D EF 46 A9 DF 27
  can0  2A5  [16]  00 02 0F 8D FC D0 57 48 F9 C8 5D EF 46 A9 DF 27
  can0  2A5  [16]  00 03 4B 31 FF 18 67 D9 AA ED 07 FB 55 4B C6 FB
  can0  2A5  [16]  00 03 4B 31 FF 18 67 D9 AA ED 07 FB 55 4B C6 FB
  can0  555   [4]  55 55 55 55
  can0  555   [4]  55 55 55 55

Beside of that, the messages match with what my application has sent (rx-ed messages
removed from the following lines, see the zip for that)

0: TX72 (003/002)  2A5#00 03 4B 31 FF 18 67 D9 AA ED 07 FB 55 4B C6 FB
0: TX72 (002/001)  2A5#00 02 0F 8D FC D0 57 48 F9 C8 5D EF 46 A9 DF 27
0: TX72 (001/000)  2A5#00 01 CD 36 DA 92 86 2E 50 67 44 CB A6 B4 83 94
0: TX72 (000/000)  2A5#00 00 C1 CA 4C 28 70 15 F6 7E 4C EF E1 A2 52 D8

And one more observation: Once or twice I have seen my application fail
with not returning MTU size in the socketcan read:

  + ./sctestself -b -n 4 -l 999 -t 2 -v cmperr,logmsg -F refilldata,leastdots,allowintloopb,stoponerror -d 16 can1
  CAN selftest can1 .
  ERROR: recvfrom: ret=16, errno=0

Is there any obvious explanation for this? I tried to add more output for this case, but it has not happed
until then.

--Stefan

Attachment: 20221230T1945_can0.tgz
Description: 20221230T1945_can0.tgz


[Index of Archives]     [Automotive Discussions]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [CAN Bus]

  Powered by Linux