Hi Russel, On Fri, 2018-05-18 at 18:50 +0100, Russell King - ARM Linux wrote: > It's necessary. Take a moment to think carefully about this: > > dma_map_single(, dir) > > dma_sync_single_for_cpu(, dir) > > dma_sync_single_for_device(, dir) > > dma_unmap_single(, dir) > > In the case of a DMA-incoherent architecture, the operations done at each > stage depend on the direction argument: > > map for_cpu for_device unmap > TO_DEV writeback none writeback none > TO_CPU invalidate invalidate* invalidate invalidate* > BIDIR writeback invalidate writeback invalidate > > * - only necessary if the CPU speculatively prefetches. I think invalidation of DMA buffer is required on for_cpu(TO_CPU) even if CPU doesn't preferch - what if we reuse the same buffer for multiple reads from DMA device? > The multiple invalidations for the TO_CPU case handles different > conditions that can result in data corruption, and for some CPUs, all > four are necessary. I would agree that map()/unmap() a quite a special cases and so depending on direction we need to execute in them either for_cpu() or for_device() call-backs depending on direction. As for invalidation in case of for_device(TO_CPU) I still don't see a rationale behind it. Would be interesting to see a real example where we benefit from this. > This is what is implemented for 32-bit ARM, depending on the CPU > capabilities, as we have DMA incoherent devices and we have CPUs that > speculatively prefetch data, and so may load data into the caches while > DMA is in operation. > > > Things get more interesting if the implementation behind the DMA API has > to copy data between the buffer supplied to the mapping and some DMA > accessible buffer: > > map for_cpu for_device unmap > TO_DEV copy to dma none copy to dma none > TO_CPU none copy to cpu none copy to cpu > BIDIR copy to dma copy to cpu copy to dma copy to cpu > > So, in both cases, the value of the direction argument defines what you > need to do in each call. Interesting enough in your seond table (which describes more complicated case indeed) you set "none" for for_device(TO_CPU) which looks logical to me. So IMHO that's what make sense: ---------------------------->8----------------------------- map for_cpu for_device unmap TO_DEV writeback none writeback none TO_CPU none invalidate none invalidate* BIDIR writeback invalidate writeback invalidate* ---------------------------->8----------------------------- * is the case for prefetching CPU. -Alexey