On Fri, May 18, 2018 at 07:57:34PM +0000, Alexey Brodkin wrote: > Hi Russel, That's Russell. > On Fri, 2018-05-18 at 18:50 +0100, Russell King - ARM Linux wrote: > > It's necessary. Take a moment to think carefully about this: > > > > dma_map_single(, dir) > > > > dma_sync_single_for_cpu(, dir) > > > > dma_sync_single_for_device(, dir) > > > > dma_unmap_single(, dir) > > > > In the case of a DMA-incoherent architecture, the operations done at each > > stage depend on the direction argument: > > > > map for_cpu for_device unmap > > TO_DEV writeback none writeback none > > TO_CPU invalidate invalidate* invalidate invalidate* > > BIDIR writeback invalidate writeback invalidate > > > > * - only necessary if the CPU speculatively prefetches. > > I think invalidation of DMA buffer is required on for_cpu(TO_CPU) even > if CPU doesn't preferch - what if we reuse the same buffer for multiple > reads from DMA device? That's fine - for non-coherent DMA, the CPU caches will only end up containing data for that memory if: - the CPU speculatively fetches data from that memory, or - the CPU explicitly touches that memory > > The multiple invalidations for the TO_CPU case handles different > > conditions that can result in data corruption, and for some CPUs, all > > four are necessary. > > I would agree that map()/unmap() a quite a special cases and so depending > on direction we need to execute in them either for_cpu() or for_device() > call-backs depending on direction. > > As for invalidation in case of for_device(TO_CPU) I still don't see > a rationale behind it. Would be interesting to see a real example where > we benefit from this. Yes, you could avoid that, but depending how you structure the architecture implementation, it can turn out to be a corner case. The above table is precisely how 32-bit ARM is implemented, because the way we implement them is based on who owns the memory - the "map" and "for_device" operations translate internally to a cpu-to-device ownership transition of the buffer. Similar for "unmap" and "to_cpu". It basically avoids having to add additional functions at the lower implementation levels. > > Things get more interesting if the implementation behind the DMA API has > > to copy data between the buffer supplied to the mapping and some DMA > > accessible buffer: > > > > map for_cpu for_device unmap > > TO_DEV copy to dma none copy to dma none > > TO_CPU none copy to cpu none copy to cpu > > BIDIR copy to dma copy to cpu copy to dma copy to cpu > > > > So, in both cases, the value of the direction argument defines what you > > need to do in each call. > > Interesting enough in your seond table (which describes more complicated > case indeed) you set "none" for for_device(TO_CPU) which looks logical > to me. > > So IMHO that's what make sense: > ---------------------------->8----------------------------- > map for_cpu for_device unmap > TO_DEV writeback none writeback none > TO_CPU none invalidate none invalidate* > BIDIR writeback invalidate writeback invalidate* > ---------------------------->8----------------------------- That doesn't make sense for the TO_CPU case. If the caches contain dirty cache lines, and you're DMAing from the device to the system RAM, other system activity can cause the dirty cache lines to be evicted (written back) to memory which the DMA has already overwritten. The result is data corruption. So, you really can't have "none" in the "map" case there. Given that, the "for_cpu" case becomes dependent on whether the CPU speculatively prefetches. -- RMK's Patch system: http://www.armlinux.org.uk/developer/patches/ FTTC broadband for 0.8mile line in suburbia: sync at 8.8Mbps down 630kbps up According to speedtest.net: 8.21Mbps down 510kbps up