So far the BCM2835 SPI driver cannot cope with TX-only and RX-only transfers (rx_buf or tx_buf is NULL) when using DMA: It relies on the SPI core to convert them to full-duplex transfers by allocating and DMA-mapping a dummy rx_buf or tx_buf. This costs performance. Resolve by pre-allocating reusable DMA descriptors which cyclically clear the RX FIFO (for TX-only transfers) or zero-fill the TX FIFO (for RX-only transfers). Patch [07/10] provides some numbers for the achieved latency improvement and CPU time reduction with an SPI Ethernet controller. SPI displays should see a similar speedup. I've also made an effort to reduce peripheral and memory bus accesses. The series is meant to be applied on top of broonie/for-next. It can be applied to Linus' current tree if commit 8d8bef503658 ("spi: bcm2835: Fix 3-wire mode if DMA is enabled") is cherry-picked from broonie's repo beforehand. Please review and test. Thank you. Lukas Wunner (10): dmaengine: bcm2835: Allow reusable descriptors dmaengine: bcm2835: Allow cyclic transactions without interrupt spi: Guarantee cacheline alignment of driver-private data spi: bcm2835: Drop dma_pending flag spi: bcm2835: Work around DONE bit erratum spi: bcm2835: Cache CS register value for ->prepare_message() spi: bcm2835: Speed up TX-only DMA transfers by clearing RX FIFO dmaengine: bcm2835: Document struct bcm2835_dmadev dmaengine: bcm2835: Avoid accessing memory when copying zeroes spi: bcm2835: Speed up RX-only DMA transfers by zero-filling TX FIFO drivers/dma/bcm2835-dma.c | 38 +++- drivers/spi/spi-bcm2835.c | 408 ++++++++++++++++++++++++++++++++------ drivers/spi/spi.c | 18 +- 3 files changed, 390 insertions(+), 74 deletions(-) -- 2.20.1