2018-08-17 20:44 GMT+03:00 Matwey V. Kornilov <matwey.kornilov@xxxxxxxxx>: > пт, 10 авг. 2018 г. в 17:27, Alan Stern <stern@xxxxxxxxxxxxxxxxxxx>: >> >> On Fri, 10 Aug 2018, Laurent Pinchart wrote: >> >> > > > Aren't you're missing a dma_sync_single_for_device() call before >> > > > submitting the URB ? IIRC that's required for correct operation of the DMA >> > > > mapping API on some platforms, depending on the cache architecture. The >> > > > additional sync can affect performances, so it would be useful to re-run >> > > > the perf test. >> > > >> > > This was already discussed: >> > > >> > > https://lkml.org/lkml/2018/7/23/1051 >> > > >> > > I rely on Alan's reply: >> > > >> > > > According to Documentation/DMA-API-HOWTO.txt, the CPU should not write >> > > > to a DMA_FROM_DEVICE-mapped area, so dma_sync_single_for_device() is >> > > > not needed. >> > >> > I fully agree that the CPU should not write to the buffer. However, I think >> > the sync call is still needed. It's been a long time since I touched this >> > area, but IIRC, some cache architectures (VIVT ?) require both cache clean >> > before the transfer and cache invalidation after the transfer. On platforms >> > where no cache management operation is needed before the transfer in the >> > DMA_FROM_DEVICE direction, the dma_sync_*_for_device() calls should be no-ops >> > (and if they're not, it's a bug of the DMA mapping implementation). >> >> In general, I agree that the cache has to be clean before a transfer >> starts. This means some sort of mapping operation (like >> dma_sync_*_for-device) is indeed required at some point between the >> allocation and the first transfer. >> >> For subsequent transfers, however, the cache is already clean and it >> will remain clean because the CPU will not do any writes to the buffer. >> (Note: clean != empty. Rather, clean == !dirty.) Therefore transfers >> following the first should not need any dma_sync_*_for_device. >> >> If you don't accept this reasoning then you should ask the people who >> wrote DMA-API-HOWTO.txt. They certainly will know more about this >> issue than I do. > > Laurent, > > I have not seen any data corruption or glitches without > dma_sync_single_for_device() on ARM and x86. > It takes additional ~20usec for dma_sync_single_for_device() to run on > ARM. So It would be great to ensure that we can reliable save another > 15% running time. DMA-API-HOWTO.txt has and example with my_card_interrupt_handler() function where it says that "CPU should not write to DMA_FROM_DEVICE-mapped area, so dma_sync_single_for_device() is not needed here." I've found that, for instance, drivers/crypto/caam/caamrng.c follows DMA-API-HOWTO.txt and don't use dma_sync_single_for_device() in the same case as we have in pwc. > >> >> Alan Stern >> > > > -- > With best regards, > Matwey V. Kornilov -- With best regards, Matwey V. Kornilov