Re: [PATCH v1 0/4] can: ctucanfd: clenup acoording to the actual rules and documentation linking

Marc Kleine-Budde <mkl@xxxxxxxxxxxxxx> · Mon, 2 May 2022 09:21:51 +0200

On 29.04.2022 23:31:28, Pavel Pisa wrote:
> > Split into separate patches and applied.
> 
> Excuse me for late reply and thanks much for split to preferred
> form. Matej Vasilevski has tested updated linux-can-next testing
> on Xilinx Zynq 7000 based MZ_APO board and used it with his
> patches to do proceed next round of testing of Jan Charvat's NuttX
> TWAI (CAN) driver on ESP32C3. We plan that CTU CAN FD timestamping
> will be send for RFC/discussion soon.

Sounds good!

> I would like to thank to Andrew Dennison who implemented, tested
> and shares integration with LiteX and RISC-V
> 
>   https://github.com/litex-hub/linux-on-litex-vexriscv
> 
> He uses development version of the CTU CAN FD IP core with configurable
> number of Tx buffers (2 to 8) for which will be required
> automatic setup logic in the driver.
> 
> I need to discuss with Ondrej Ille actual state and his plans.
> But basically ntxbufs in the ctucan_probe_common() has to be assigned
> from TXTB_INFO TXT_BUFFER_COUNT field. For older core version
> the TXT_BUFFER_COUNT field bits should be equal to zero so when
> value is zero, the original version with fixed 4 buffers will
> be recognized.

Makes sense

> When value is configurable then for (uncommon) number
> of buffers which is not power of two, there will be likely
> a problem with way how buffers queue is implemented
> 
>   txtb_id = priv->txb_head % priv->ntxbufs;
>   ...
>   priv->txb_head++;
>   ...
>   priv->txb_tail++;
> 
> When I have provided example for this type of queue many years
> ago I have probably shown example with power of 2 masking,
> but modulo by arbitrary number does not work with sequence
> overflow. Which means to add there two "if"s unfortunately
> 
>   if (++priv->txb_tail == 2 * priv->ntxbufs)
>       priv->txb_tail = 0;

There's another way to implement this, here for ring->obj_num being
power of 2:

| static inline u8 mcp251xfd_get_tx_head(const struct mcp251xfd_tx_ring *ring)
| {
| 	return ring->head & (ring->obj_num - 1);
| }
| 
| static inline u8 mcp251xfd_get_tx_tail(const struct mcp251xfd_tx_ring *ring)
| {
| 	return ring->tail & (ring->obj_num - 1);
| }
| 
| static inline u8 mcp251xfd_get_tx_free(const struct mcp251xfd_tx_ring *ring)
| {
| 	return ring->obj_num - (ring->head - ring->tail);
| }

If you want to allow not power of 2 ring->obj_num, use "% ring->obj_num"
instead of "& (ring->obj_num - 1)".

I'm not sure of there is a real world benefit (only gut feeling, should
be measured) of using more than 4, but less than 8 TX buffers.

You can make use of more TX buffers, if you implement (fully hardware
based) TX IRQ coalescing (== handle more than one TX complete interrupt
at a time) like in the mcp251xfd driver, or BQL support (== send more
than one TX CAN frame at a time). I've played a bit with BQL support on
the mcp251xfd driver (which is attached by SPI), but with mixed results.
Probably an issue with proper configuration.

> We need 2 * priv->ntxbufs range to distinguish empty and full queue...
> But modulo is not nice either so I probably come with some other
> solution in a longer term. In the long term, I want to implement
> virtual queues to allow multiqueue to use dynamic Tx priority
> of up to 8 the buffers...

ACK, multiqueue TX support would be nice for things like the Earliest TX
Time First scheduler (ETF). 1 TX queue for ETF, the other for bulk
messages.

regards,
Marc

-- 
Pengutronix e.K.                 | Marc Kleine-Budde           |
Embedded Linux                   | https://www.pengutronix.de  |
Vertretung West/Dortmund         | Phone: +49-231-2826-924     |
Amtsgericht Hildesheim, HRA 2686 | Fax:   +49-5121-206917-5555 |
Attachment:
signature.asc

Description: PGP signature