On Fri, May 18, 2018 at 10:20:02AM -0700, Vineet Gupta wrote: > I never understood the need for this direction. And if memory serves me > right, at that time I was seeing twice the amount of cache flushing ! It's necessary. Take a moment to think carefully about this: dma_map_single(, dir) dma_sync_single_for_cpu(, dir) dma_sync_single_for_device(, dir) dma_unmap_single(, dir) In the case of a DMA-incoherent architecture, the operations done at each stage depend on the direction argument: map for_cpu for_device unmap TO_DEV writeback none writeback none TO_CPU invalidate invalidate* invalidate invalidate* BIDIR writeback invalidate writeback invalidate * - only necessary if the CPU speculatively prefetches. The multiple invalidations for the TO_CPU case handles different conditions that can result in data corruption, and for some CPUs, all four are necessary. This is what is implemented for 32-bit ARM, depending on the CPU capabilities, as we have DMA incoherent devices and we have CPUs that speculatively prefetch data, and so may load data into the caches while DMA is in operation. Things get more interesting if the implementation behind the DMA API has to copy data between the buffer supplied to the mapping and some DMA accessible buffer: map for_cpu for_device unmap TO_DEV copy to dma none copy to dma none TO_CPU none copy to cpu none copy to cpu BIDIR copy to dma copy to cpu copy to dma copy to cpu So, in both cases, the value of the direction argument defines what you need to do in each call. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up