пт, 10 авг. 2018 г. в 17:27, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>: > > On Fri, 10 Aug 2018, Laurent Pinchart wrote: > > > > > Aren't you're missing a dma_sync_single_for_device() call before > > > > submitting the URB ? IIRC that's required for correct operation of the DMA > > > > mapping API on some platforms, depending on the cache architecture. The > > > > additional sync can affect performances, so it would be useful to re-run > > > > the perf test. > > > > > > This was already discussed: > > > > > > https://lkml.org/lkml/2018/7/23/1051 > > > > > > I rely on Alan's reply: > > > > > > > According to Documentation/DMA-API-HOWTO.txt, the CPU should not write > > > > to a DMA_FROM_DEVICE-mapped area, so dma_sync_single_for_device() is > > > > not needed. > > > > I fully agree that the CPU should not write to the buffer. However, I think > > the sync call is still needed. It's been a long time since I touched this > > area, but IIRC, some cache architectures (VIVT ?) require both cache clean > > before the transfer and cache invalidation after the transfer. On platforms > > where no cache management operation is needed before the transfer in the > > DMA_FROM_DEVICE direction, the dma_sync_*_for_device() calls should be no-ops > > (and if they're not, it's a bug of the DMA mapping implementation). > > In general, I agree that the cache has to be clean before a transfer > starts. This means some sort of mapping operation (like > dma_sync_*_for-device) is indeed required at some point between the > allocation and the first transfer. > > For subsequent transfers, however, the cache is already clean and it > will remain clean because the CPU will not do any writes to the buffer. > (Note: clean != empty. Rather, clean == !dirty.) Therefore transfers > following the first should not need any dma_sync_*_for_device. > > If you don't accept this reasoning then you should ask the people who > wrote DMA-API-HOWTO.txt. They certainly will know more about this > issue than I do. Laurent, I have not seen any data corruption or glitches without dma_sync_single_for_device() on ARM and x86. It takes additional ~20usec for dma_sync_single_for_device() to run on ARM. So It would be great to ensure that we can reliable save another 15% running time. > > Alan Stern > -- With best regards, Matwey V. Kornilov