On Fri, 10 Aug 2018, Laurent Pinchart wrote: > > > Aren't you're missing a dma_sync_single_for_device() call before > > > submitting the URB ? IIRC that's required for correct operation of the DMA > > > mapping API on some platforms, depending on the cache architecture. The > > > additional sync can affect performances, so it would be useful to re-run > > > the perf test. > > > > This was already discussed: > > > > https://lkml.org/lkml/2018/7/23/1051 > > > > I rely on Alan's reply: > > > > > According to Documentation/DMA-API-HOWTO.txt, the CPU should not write > > > to a DMA_FROM_DEVICE-mapped area, so dma_sync_single_for_device() is > > > not needed. > > I fully agree that the CPU should not write to the buffer. However, I think > the sync call is still needed. It's been a long time since I touched this > area, but IIRC, some cache architectures (VIVT ?) require both cache clean > before the transfer and cache invalidation after the transfer. On platforms > where no cache management operation is needed before the transfer in the > DMA_FROM_DEVICE direction, the dma_sync_*_for_device() calls should be no-ops > (and if they're not, it's a bug of the DMA mapping implementation). In general, I agree that the cache has to be clean before a transfer starts. This means some sort of mapping operation (like dma_sync_*_for-device) is indeed required at some point between the allocation and the first transfer. For subsequent transfers, however, the cache is already clean and it will remain clean because the CPU will not do any writes to the buffer. (Note: clean != empty. Rather, clean == !dirty.) Therefore transfers following the first should not need any dma_sync_*_for_device. If you don't accept this reasoning then you should ask the people who wrote DMA-API-HOWTO.txt. They certainly will know more about this issue than I do. Alan Stern