On Thu, Mar 20, 2025, at 15:42, Mark Brown wrote: > On Thu, Mar 20, 2025 at 02:35:41PM +0100, Arnd Bergmann wrote: >> On Thu, Mar 20, 2025, at 13:29, Mark Brown wrote: > >> > On a lot of systems most transfers are short and won't be DMAed at all >> > since PIO ends up being more efficient, and most hardware is perfectly >> > happy to DMA to/from wherever so *shrug*. SPI_BUFSIZ is a maximum of 32 >> > bytes which is going to be under the copybreak limit for quite a few >> > controllers, though admittedly 16 is also a popular number, so a lot of >> > the time we don't actually DMA out of it at all. > >> I saw the same thing looked at it the other day and got confused >> about why 'local_buf' is allocated with GFP_DMA and 'buf' >> uses plain GFP_KERNEL when they are both used in the same place. > > Really we just don't expect the small buffer to be DMAed. I looked at a couple of drivers and found many have a copybreak limit less than SPI_BUFSIZE #define SPI_BUFSIZ max(32, SMP_CACHE_BYTES) drivers/spi/atmel-quadspi.c:#define ATMEL_QSPI_DMA_MIN_BYTES 16 drivers/spi/spi-at91-usart.c:#define US_DMA_MIN_BYTES 16 drivers/spi/spi-atmel.c:#define DMA_MIN_BYTES 16 drivers/spi/spi-davinci.c:#define DMA_MIN_BYTES 16 drivers/spi/spi-stm32.c:#define SPI_DMA_MIN_BYTES 16 drivers/spi/spi-imx.c: .fifo_size = 8, so any transfers from 17 to 32 bytes would try to use the non-GFP_DMA 'buf' and then try to map that. >> It also seems that the copy happens in the regmap_bulk_read() >> path but not the regmap_bulk_write(), which just passes down >> the original buffer without copying, as far as I can tell. > > Yes, writes don't need to do anything. Can you explain why writes are different from reads here? >> I think we have some corner cases where a driver allocates >> a GFP_DMA buffer, calls spi_write_then_read through regmap, >> which copies the data to the non-GFP_DMA global buffer, >> and then the SPI controller driver calls dma_map_single() >> on that, ending up with a third bounce buffer from >> swiotlb. > > I suspect that practically speaking almost everything will be under the > copybreak limit. Probably we should just make regmap use spi_sync() > with the supplied buffers here and avoid spi_write_then_read(). I'm a bit lost in that code, but doesn't spi_sync() require a buffer that can be passed into dma_map_sg() and (in theory at least) GFP_DMA? Based on what I see, every SPI transfer code paths goes through __spi_pump_transfer_message() and spi_map_msg(), which then tries to map it. This means two things: - any user passing 17 to 32 bytes into the read function either works correctly because the GFP_DMA was not needed, or it's broken and nobody noticed - the way that spi_map_buf_attrs() is written, it actually supports addresses from both kmap() and vmalloc() and will attempt to correctly map those rather than reject the buffer. While this sounds like a good idea, handling vmalloc data like this risks stack data corruption on non-coherent platforms when failing to map stack buffers would be the better response. >> I don't know what a good replacement interface would be, but >> ideally there should never be more than one copy here, >> which means that any temporary buffer would need to be >> allocated according to the dma_mask of the underlying >> bus master (dmaengine, spi controller, ...). > > Which is a pain because even if you've got the device for the SPI > controller there's no way to discover if it does it's own DMA. __spi_map_msg() already handles the case of an external DMA master through ctlr->dma_map_dev, so I think the same could be used to get a temporary buffer using dma_alloc_noncoherent() inside of spi_write_then_read() in place of the kmalloc(, GFP_DMA). Arnd