On Sun, Aug 05, 2018 at 11:33:38AM +0300, Matwey V. Kornilov wrote: > >> Taken together, those measurements look like a pretty good argument for > >> always using dma_sync_single_for_cpu in the driver. Provided results > >> on other platforms aren't too far out of line with these results. > > > > Logically speaking on no-mmio no-swiotlb platforms dma_sync_single_for_cpu > > and dma_unmap should always be identical. With the migration towards > > everyone using dma-direct and dma-noncoherent this is actually going to > > be enforced, and I plan to move that enforcement to common code in the > > next merge window or two. > > > > I think that Alan means that using dma_sync_single_for_cpu() we save > time required for subsequent dma_map() call (which is required when we > do dma_unmap()). The point still stands. By definition for a correct DMA API implementation a dma_sync_single_for_cpu/dma_sync_single_for_device pair is always going to be cheaper than a dma_unmap/dma_map pair, although for many cases the difference might be so small that it is not measureable. If you reuse a buffer using dma_sync_single* is always the right thing to do vs unmapping and remapping it.