Re: [PATCH 7/7] spi: bcm2835: Speed up FIFO access if fill level is known

kernel@xxxxxxxxxxxxxxxx · Sat, 10 Nov 2018 11:03:17 +0100

> On 08.11.2018, at 08:06, Lukas Wunner <lukas@xxxxxxxxx> wrote:
> 
> The RX and TX FIFO of the BCM2835 SPI master each accommodate 64 bytes
> (16 32-bit dwords).  The CS register provides hints on their fill level:
> 
>   "Bit 19  RXR - RX FIFO needs Reading ([¾] full)
>    0 = RX FIFO is less than [¾] full (or not active TA = 0).
>    1 = RX FIFO is [¾] or more full. Cleared by reading sufficient
>        data from the RX FIFO or setting TA to 0."
> 
>   "Bit 16  DONE - Transfer Done
>    0 = Transfer is in progress (or not active TA = 0).
>    1 = Transfer is complete. Cleared by writing more data to the
>        TX FIFO or setting TA to 0."
> 
>   "If DONE is set [...], write up to 16 [dwords] to SPI_FIFO. [...]
>    If RXR is set read 12 [dwords] data from SPI_FIFO."
> 
>   [Source: Pages 153, 154 and 158 of
>    https://www.raspberrypi.org/app/uploads/2012/02/BCM2835-ARM-Peripherals.pdf
>    Note: The spec is missing the "¾" character, presumably due to
>    copy-pasting from a different charset.  It also incorrectly
>    refers to 16 and 12 "bytes" instead of 32-bit dwords.]
> 
> In short, the RXR bit indicates that 48 bytes can be read and the DONE
> bit indicates 64 bytes can be written.  Leverage this knowledge to read
> or write bytes blindly to the FIFO, without polling whether data can be
> read or free space is available to write.  Moreover, when a transfer is
> starting, the TX FIFO is known to be empty, likewise allowing a blind
> write of 64 bytes.
> 
> This cuts the number of bus accesses in half if the fill level is known.
> Also, the (posted) write accesses can be pipelined on the AXI bus since
> they are no longer interleaved with (non-posted) reads.
> 
> bcm2835_spi_transfer_one_poll() previously switched to interrupt mode
> when exceeding a time limit by calling bcm2835_spi_transfer_one_irq().
> Because the latter now assumes an empty FIFO, it can no longer be called
> directly.  Modify the CS register instead to achieve the same result.
> 
> Signed-off-by: Lukas Wunner <lukas@xxxxxxxxx>
> Cc: Mathias Duckeck <m.duckeck@xxxxxxxxx>
> Cc: Frank Pavlic <f.pavlic@xxxxxxxxx>
> Cc: Martin Sperl <kernel@xxxxxxxxxxxxxxxx>
> Cc: Noralf Trønnes <noralf@xxxxxxxxxxx>
> ---
> drivers/spi/spi-bcm2835.c | 66 ++++++++++++++++++++++++++++++++++++---
> 1 file changed, 62 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/spi/spi-bcm2835.c b/drivers/spi/spi-bcm2835.c
> index 36719d2cc12d..fd9a73963f8c 100644
> --- a/drivers/spi/spi-bcm2835.c
> +++ b/drivers/spi/spi-bcm2835.c
> @@ -72,6 +72,8 @@
> #define BCM2835_SPI_CS_CS_10		0x00000002
> #define BCM2835_SPI_CS_CS_01		0x00000001
> 
> +#define BCM2835_SPI_FIFO_SIZE		64
> +#define BCM2835_SPI_FIFO_SIZE_3_4	48
Not sure if this should not be a DT parameter describing the HW block
and not being hardcoded in the driver.

> @@ -672,13 +728,13 @@ static int bcm2835_spi_transfer_one_poll(struct spi_master *master,
> 	unsigned long timeout;
> 
> 	/* enable HW block without interrupts */
> -	bcm2835_wr(bs, BCM2835_SPI_CS, cs | BCM2835_SPI_CS_TA);
> +	bcm2835_wr(bs, BCM2835_SPI_CS, cs |= BCM2835_SPI_CS_TA);
I believe coding style says: variable assignments should be separated
from function call for readability.
(similar to the code below)
> @@ -700,8 +756,10 @@ static int bcm2835_spi_transfer_one_poll(struct spi_master *master,
> 					    jiffies - timeout,
> 					    bs->tx_len, bs->rx_len);
> 			/* fall back to interrupt mode */
> -			return bcm2835_spi_transfer_one_irq(master, spi,
> -							    tfr, cs);
> +			cs |= BCM2835_SPI_CS_INTR | BCM2835_SPI_CS_INTD;
> +			bcm2835_wr(bs, BCM2835_SPI_CS, cs);
> +			/* tell SPI core to wait for completion */
> +			return 1;
> 		}
> 	}
What is wrong about using bcm2835_spi_transfer_one_irq?
It makes the intention clear that we switch to irq mode.

Why are you open-coding it instead?

If there is a slight different behavior now then I would recommend amending
the transfer_one_irq to handle the behavior.

If it is done by adding a “fill bytes” argument to transfer_one_irq that 
may be set to 0 in this case then the compiler should be able to optimize
the unnecessary code away.

Martin