Hi Russel, On Fri, 2018-05-18 at 18:50 +0100, Russell King - ARM Linux wrote:
It's necessary. Take a moment to think carefully about this: dma_map_single(, dir) dma_sync_single_for_cpu(, dir) dma_sync_single_for_device(, dir) dma_unmap_single(, dir) In the case of a DMA-incoherent architecture, the operations done at each stage depend on the direction argument: map for_cpu for_device unmap TO_DEV writeback none writeback none TO_CPU invalidate invalidate* invalidate invalidate* BIDIR writeback invalidate writeback invalidate * - only necessary if the CPU speculatively prefetches.
I think invalidation of DMA buffer is required on for_cpu(TO_CPU) even if CPU doesn't preferch - what if we reuse the same buffer for multiple reads from DMA device?
The multiple invalidations for the TO_CPU case handles different conditions that can result in data corruption, and for some CPUs, all four are necessary.
I would agree that map()/unmap() a quite a special cases and so depending on direction we need to execute in them either for_cpu() or for_device() call-backs depending on direction. As for invalidation in case of for_device(TO_CPU) I still don't see a rationale behind it. Would be interesting to see a real example where we benefit from this.
This is what is implemented for 32-bit ARM, depending on the CPU capabilities, as we have DMA incoherent devices and we have CPUs that speculatively prefetch data, and so may load data into the caches while DMA is in operation. Things get more interesting if the implementation behind the DMA API has to copy data between the buffer supplied to the mapping and some DMA accessible buffer: map for_cpu for_device unmap TO_DEV copy to dma none copy to dma none TO_CPU none copy to cpu none copy to cpu BIDIR copy to dma copy to cpu copy to dma copy to cpu So, in both cases, the value of the direction argument defines what you need to do in each call.
Interesting enough in your seond table (which describes more complicated case indeed) you set "none" for for_device(TO_CPU) which looks logical to me. So IMHO that's what make sense: ---------------------------->8----------------------------- map for_cpu for_device unmap TO_DEV writeback none writeback none TO_CPU none invalidate none invalidate* BIDIR writeback invalidate writeback invalidate* ---------------------------->8----------------------------- * is the case for prefetching CPU. -Alexey��.n��������+%������w��{.n�����{��n����ܨ}���Ơz�j:+v�����w����ޙ��&�)ߡ�a����z�ޗ���ݢj��w�f