Re: [PATCH] spi: Ensure memory used for spi_write_then_read() is DMA safe

"Arnd Bergmann" <arnd@xxxxxxxx> · Thu, 20 Mar 2025 17:30:01 +0100

On Thu, Mar 20, 2025, at 15:42, Mark Brown wrote:
> On Thu, Mar 20, 2025 at 02:35:41PM +0100, Arnd Bergmann wrote:
>> On Thu, Mar 20, 2025, at 13:29, Mark Brown wrote:
>
>> > On a lot of systems most transfers are short and won't be DMAed at all
>> > since PIO ends up being more efficient, and most hardware is perfectly
>> > happy to DMA to/from wherever so *shrug*.  SPI_BUFSIZ is a maximum of 32
>> > bytes which is going to be under the copybreak limit for quite a few
>> > controllers, though admittedly 16 is also a popular number, so a lot of
>> > the time we don't actually DMA out of it at all.
>
>> I saw the same thing looked at it the other day and got confused
>> about why 'local_buf' is allocated with GFP_DMA and 'buf'
>> uses plain GFP_KERNEL when they are both used in the same place.
>
> Really we just don't expect the small buffer to be DMAed.

I looked at a couple of drivers and found many have a copybreak
limit less than SPI_BUFSIZE

#define SPI_BUFSIZ      max(32, SMP_CACHE_BYTES)

drivers/spi/atmel-quadspi.c:#define ATMEL_QSPI_DMA_MIN_BYTES    16
drivers/spi/spi-at91-usart.c:#define US_DMA_MIN_BYTES       16
drivers/spi/spi-atmel.c:#define DMA_MIN_BYTES   16
drivers/spi/spi-davinci.c:#define DMA_MIN_BYTES 16
drivers/spi/spi-stm32.c:#define SPI_DMA_MIN_BYTES       16
drivers/spi/spi-imx.c:  .fifo_size = 8,

so any transfers from 17 to 32 bytes would try to use the
non-GFP_DMA 'buf' and then try to map that.

>> It also seems that the copy happens in the regmap_bulk_read()
>> path but not the regmap_bulk_write(), which just passes down
>> the original buffer without copying, as far as I can tell.
>
> Yes, writes don't need to do anything.

Can you explain why writes are different from reads here?

>> I think we have some corner cases where a driver allocates
>> a GFP_DMA buffer, calls spi_write_then_read through regmap,
>> which copies the data to the non-GFP_DMA global buffer,
>> and then the SPI controller driver calls dma_map_single()
>> on that, ending up with a third bounce buffer from
>> swiotlb.
>
> I suspect that practically speaking almost everything will be under the
> copybreak limit.  Probably we should just make regmap use spi_sync()
> with the supplied buffers here and avoid spi_write_then_read().

I'm a bit lost in that code, but doesn't spi_sync() require
a buffer that can be passed into dma_map_sg() and (in theory
at least) GFP_DMA?

Based on what I see, every SPI transfer code paths goes
through __spi_pump_transfer_message() and spi_map_msg(),
which then tries to map it. This means two things:

- any user passing 17 to 32 bytes into the read function
  either works correctly because the GFP_DMA was not needed,
  or it's broken and nobody noticed

- the way that spi_map_buf_attrs() is written, it actually
  supports addresses from both kmap() and vmalloc() and
  will attempt to correctly map those rather than reject
  the buffer. While this sounds like a good idea, handling
  vmalloc data like this risks stack data corruption
  on non-coherent platforms when failing to map stack
  buffers would be the better response.

>> I don't know what a good replacement interface would be, but
>> ideally there should never be more than one copy here,
>> which means that any temporary buffer would need to be
>> allocated according to the dma_mask of the underlying
>> bus master (dmaengine, spi controller, ...).
>
> Which is a pain because even if you've got the device for the SPI
> controller there's no way to discover if it does it's own DMA.

__spi_map_msg() already handles the case of an external
DMA master through ctlr->dma_map_dev, so I think the same
could be used to get a temporary buffer using
dma_alloc_noncoherent() inside of spi_write_then_read()
in place of the kmalloc(, GFP_DMA).

      Arnd