Re: mcp251xfd receiving non ACKed frames (was: Re: More flags for logging)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Marc,

Thanks for the detailed answer!

On Mar. 4 Mai 2021 at 16:48, Marc Kleine-Budde <mkl@xxxxxxxxxxxxxx> wrote:
> On 04.05.2021 06:46:17, Vincent MAILHOL wrote:
> > > And even on the mcp251xfd, where I receive the CAN frame, there's no way
> > > to tell if this frame has been acked or not.
>
> The test setup is:
>
>                     flexcan (listen only)
>                              |
>                              |
>    PEAK PCAN-USB FD ---------+--------- mcp2518fd (listen only)
>         (sender)             |
>                              |
>                candlelight (going to be unplugged)
>
> pcan-usb: sending CAN frames
> flexcan: receiving CAN frames - but controller in listen only mode
> mcp2518fd: receiving CAN frames - but controller in listen only mode
> candlelight: receiving CAN frames - first attached, then detached
>
> > The mcp251xfd behavior is interesting. Do you also receive the ACK
> > error flag?
>
> In my tests from yesterday neither the flexcan nor the mcp2518fd had bus
> error reporting enabled. So I haven't noticed any ACK errors on the
> mcp2518fd nor the flexcan.
>
> I just repeated the test with bus error reporting enabled:
>
> On the flexcan I receive _only_ these errors (repeating) with
> candlelight detached:
>
> | (2021-05-04 09:00:30.407709)        can0  RX - -  20000088   [8]  00 00 08 00 00 00 00 00   ERRORFRAME
> |        protocol-violation{{tx-dominant-bit-error}{}}
> |        bus-error
>
>
> On the mcp2518fd I see these errors:
>
> | (2021-05-04 09:05:00.594321)  mcp251xfd0  RX - -  222   [8]  4A 00 00 00 00 00 00 00
> | (2021-05-04 09:05:01.094418)  mcp251xfd0  RX - -  222   [8]  4B 00 00 00 00 00 00 00
> | (2021-05-04 09:05:01.594577)  mcp251xfd0  RX - -  222   [8]  4C 00 00 00 00 00 00 00
> ...unplug candlelight here...
> | (2021-05-04 09:05:02.094878)  mcp251xfd0  RX - -  20000088   [8]  00 00 02 00 00 00 00 00   ERRORFRAME
> |        protocol-violation{{frame-format-error}{}}
> |        bus-error
> | (2021-05-04 09:05:02.095589)  mcp251xfd0  RX - -  20000088   [8]  00 00 02 00 00 00 00 00   ERRORFRAME
> |        protocol-violation{{frame-format-error}{}}
> |        bus-error
> | (2021-05-04 09:05:02.096263)  mcp251xfd0  RX - -  20000088   [8]  00 00 02 00 00 00 00 00   ERRORFRAME
> |        protocol-violation{{frame-format-error}{}}
> |        bus-error
> | (2021-05-04 09:05:02.096934)  mcp251xfd0  RX - -  20000088   [8]  00 00 02 00 00 00 00 00   ERRORFRAME
> |        protocol-violation{{frame-format-error}{}}
> |        bus-error
> | (2021-05-04 09:05:02.097596)  mcp251xfd0  RX - -  20000088   [8]  00 00 02 00 00 00 00 00   ERRORFRAME
> |        protocol-violation{{frame-format-error}{}}
> |        bus-error
> | (2021-05-04 09:05:02.098261)  mcp251xfd0  RX - -  20000088   [8]  00 00 02 00 00 00 00 00   ERRORFRAME
> |        protocol-violation{{frame-format-error}{}}
> |        bus-error
> | (2021-05-04 09:05:02.099035)  mcp251xfd0  RX - -  222   [8]  4D 00 00 00 00 00 00 00
> | (2021-05-04 09:05:02.099054)  mcp251xfd0  RX - -  222   [8]  4D 00 00 00 00 00 00 00
> | (2021-05-04 09:05:02.099603)  mcp251xfd0  RX - -  20000088   [8]  00 00 00 00 00 00 00 00   ERRORFRAME
> |        protocol-violation{{}{}}
> |        bus-error
>
> from here now only RX frames, no error frames

I guess that above error flags are the consequence of the
interferences on the bus while unplugging the candlelight. Those
are probably not relevant to our specific topic.

> | (2021-05-04 09:05:02.100540)  mcp251xfd0  RX - -  222   [8]  4D 00 00 00 00 00 00 00
> | (2021-05-04 09:05:02.100570)  mcp251xfd0  RX - -  222   [8]  4D 00 00 00 00 00 00 00
> | (2021-05-04 09:05:02.100583)  mcp251xfd0  RX - -  222   [8]  4D 00 00 00 00 00 00 00
> | (2021-05-04 09:05:02.100593)  mcp251xfd0  RX - -  222   [8]  4D 00 00 00 00 00 00 00
> | (2021-05-04 09:05:02.101326)  mcp251xfd0  RX - -  222   [8]  4D 00 00 00 00 00 00 00
>
> ... and repeating.
>
>
> Here a short dump of the mcp2518fd registers:
>
> | INT: intf(0x01c)=0xbf1a0806
> |                 IE      IF      IE & IF
> |         IVMI    x                       Invalid Message Interrupt
> |         WAKI                            Bus Wake Up Interrupt
> |         CERRI   x                       CAN Bus Error Interrupt
> |         SERRI   x                       System Error Interrupt
> |         RXOVI   x       x       x       Receive FIFO Overflow Interrupt
> |         TXATI   x                       Transmit Attempt Interrupt
> |         SPICRCI x                       SPI CRC Error Interrupt
> |         ECCI    x                       ECC Error Interrupt
> |         TEFI    x                       Transmit Event FIFO Interrupt
> |         MODI    x                       Mode Change Interrupt
> |         TBCI            x               Time Base Counter Interrupt
> |         RXI     x       x       x       Receive FIFO Interrupt
> |         TXI                             Transmit FIFO Interrupt
>
> Note: there is no invalid message interrupt pending
>
> | TREC: trec(0x034)=0x00000000
> |             TXBO                Transmitter in Bus Off State
> |             TXBP                Transmitter in Error Passive State
> |             RXBP                Receiver in Error Passive State
> |           TXWARN                Transmitter in Error Warning State
> |           RXWARN                Receiver in Error Warning State
> |            EWARN                Transmitter or Receiver is in Error Warning State
> |              TEC =   0          Transmit Error Counter
> |              REC =   0          Receive Error Counter
> |
> | BDIAG0: bdiag0(0x038)=0x00000010
> |         DTERRCNT =   0          Data Bit Rate Transmit Error Counter
> |         DRERRCNT =   0          Data Bit Rate Receive Error Counter
> |         NTERRCNT =   0          Nominal Bit Rate Transmit Error Counter
> |         NRERRCNT =  16          Nominal Bit Rate Receive Error Counter
> |
> | BDIAG1: bdiag1(0x03c)=0x0000dd4b
> |            DLCMM                DLC Mismatch
> |              ESI                ESI flag of a received CAN FD message was set
> |          DCRCERR                Data CRC Error
> |         DSTUFERR                Data Bit Stuffing Error
> |         DFORMERR                Data Format Error
> |         DBIT1ERR                Data BIT1 Error
> |         DBIT0ERR                Data BIT0 Error
> |          TXBOERR                Device went to bus-off (and auto-recovered)
> |          NCRCERR                CRC Error
> |         NSTUFERR                Bit Stuffing Error
> |         NFORMERR                Format Error
> |          NACKERR                Transmitted message was not acknowledged
> |         NBIT1ERR                Bit1 Error
> |         NBIT0ERR                Bit0 Error
> |         EFMSGCNT = 56651                Error Free Message Counter
>
> > Does the controller retry to send the frame until it gets
> > acknowledged?
>
> Yes - as it should.

I should have been more careful when reading your previous
message. I could have seen that you sent the message with an
increasing payload and that as soon as the acknowledging node was
removed, the same payload kept repeating again and again.

In light of above information I have two remarks:

First, the Peak does not generate the ACK error flag as it is
expected to do. I do not know if this is a side effect of setting
it to listen only. I would expect the listen only mode to only
impact the reception, but maybe it has the side effect of also
allowing to not generate an error if not receiving the ACK bit?
Does the Peak correctly send the ACK error flag when sending in
normal mode (not listen only)?

Second, the receiver behaviour when receiving an non-ACKed frame
is actually unspecified. As mentioned before, non-ACKed frames
should be immediately followed by an ACK error flag. Here, the
receiving nodes are facing a situation which should never
occur. The mcp2518fd decides to register the frame as received
and the flexcan decides to not register the frame. I think that
both behaviors are actually fine: with the lack of specification,
the implementation is free to decide how to handle this side
case.

In short, the real question is the first point: why didn't the
Peak send the ACK error flag?

> > Are you still able to send frames and receive the echo if there is a
> > single node on the network?
>
> No - But the peak driver/hw has some limitations:
>
> The peak driver doesn't have TX complete signaling, it send the echo
> after sending the TX CAN frame via USB. And the peak controller seems to
> buffer quite a lot TX CAN frames, so it looks for the first ~72 frames
> like the bus is still working.

Yes, I also noticed that when I had peak devices in my test
lab. The peak driver call can_put_echo_skb() inside
peak_usb_ndo_start_xmit() and thus, the echo frames do not
reflect whether the actual completion occured or not. I guess
fixing that should not be too hard but I do not have access to
that hardware anymore to do it myself.

I am just surprised by the value of 72 frames. My understanding
is that peak_usb_ndo_start_xmit() should stop the network queue
whenever the number of active tx urbs reaches 10.
Ref:
https://elixir.bootlin.com/linux/latest/source/drivers/net/can/usb/peak_usb/pcan_usb_core.c#L399
https://elixir.bootlin.com/linux/latest/source/drivers/net/can/usb/peak_usb/pcan_usb_core.h#L29


Yours sincerely,
Vincent



[Index of Archives]     [Automotive Discussions]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [CAN Bus]

  Powered by Linux