Re: AW: [PATCH 0/5] can: mcp251xfd: workaround double-RX erratum

Marc Kleine-Budde <mkl@xxxxxxxxxxxxxx> · Mon, 16 Jan 2023 23:15:59 +0100

On 16.01.2023 19:49:18, Stefan Althöfer wrote:
> > With the help of Thomas we found out that the chip has a time window
> > after receiving a CAN frame where the RX FIFO STA register content
> > is not read correctly.
> 
> Does the workaround assume that most of the messages in the fifo are
> old (already read) content?

Before this change the driver used to read the RX FIFO head from the
chip (it points to the index that is written to next by the driver).
Then all CAN frames between the FIFO tail and head index are read from
the chip's RAM, pushed into the networking stack and finally marked as
read in the chip.

The driver 100% trusts the FIFO head read from the chip, in the bad case
it 1) pushed old messages into the networking stack and 2) confuses the
chip with marking "too many" messages as read. The rest are follow up
errors.

With the patch the driver still reads the RX FIFO head, but calculates
the number of RX'ed CAN frames waiting in the FIFO. (That is the
difference between the FIFO head and tail, plus taking wraparounds and
the FIFO full and empty bits into account.) The driver reads the waiting
CAN frames from the chip's RAM, and iterates over them.

In the good case the timestamps of the messages are always increasing.
(Again taking wraparounds and long pauses between messages into
account.)

In the bad case the driver encounters a timestamp that is older than the
last successfully received one, it assumes due to the erratum. The RX
FIFO head was read as too big, resulting in too many read messages. The
processing is stopped.

The driver still trusts that the RX messages are filled in FIFO manner
into the ring-buffer in the RAM. As soon as an old messages is
encountered, the driver assumes it has read too far and only "old"
messages will follow.

Thomas did some test on a µC, where the RX FIFO head index in read 2x in
a row. With the right timing the 1st head index is broken, while the 2nd
read shows correct data again.

In both the good and bad case now the driver's FIFO head and tail are
updated with the successful processed messages. These number of messages
are marked as read in the chip.

> What happens if there is mostly new (not yet read) content in the
> fifo? Suppose a slow host or coalescing. Can the temporarily incorrect
> RX FIFO STA register point to one of those (ahead of next-to-read)?

Let me explain how I understood you scenario:
- there are 4 messages pending
- the driver reads the FIFO head, read hit by erratum
- driver thinks it has to read 8 messages
- the host is slow and on the CAN bus 2 new messages arrive
- on the chip a 4+2=6 messages pending
- the driver reads 8 messages from the chip
- the driver iterates over the 8 messages
- the timestamp of message 6 is newer than message 7
- the driver stops processing after message 6
- 6 messages are marked are read in the chip
- with the next RX-IRQ the next loop begins...
  note: the next message that is processed is message 7
  because we stopped processing after 6 in the previous loop

> Wouldn't we drop messages or cause a deadlock then?

The driver doesn't "step over" old messages in the FIFO, it stops
processing, finishes, and waits for the next RX-IRQ to come.

Hope that makes it a bit clearer,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |
Attachment:
signature.asc

Description: PGP signature