On Sat, Aug 04, 2018 at 10:46:35AM -0400, Alan Stern wrote: > > 2) dma_unmap and dma_map in the handler: > > 2A) dma_unmap_single call: 28.8 +- 1.5 usec > > 2B) memcpy and the rest: 58 +- 6 usec > > 2C) dma_map_single call: 22 +- 2 usec > > Total: 110 +- 7 usec > > > > 3) dma_sync_single_for_cpu > > 3A) dma_sync_single_for_cpu call: 29.4 +- 1.7 usec > > 3B) memcpy and the rest: 59 +- 6 usec > > 3C) noop (trace events overhead): 5 +- 2 usec > > Total: 93 +- 7 usec > > > > So, now we see that 2A and 3A (as well as 2B and 3B) agree good within > > error ranges. > > Taken together, those measurements look like a pretty good argument for > always using dma_sync_single_for_cpu in the driver. Provided results > on other platforms aren't too far out of line with these results. Logically speaking on no-mmio no-swiotlb platforms dma_sync_single_for_cpu and dma_unmap should always be identical. With the migration towards everyone using dma-direct and dma-noncoherent this is actually going to be enforced, and I plan to move that enforcement to common code in the next merge window or two.