On Mon, Oct 11, 2021 at 03:04:05PM +0800, Cai Huoqing wrote: > dma_sync_ API is not called, I think the hardware may keep cache coherent > directly or is a no cache system. No need to make perfermance compare. On a device that is not attached in a cache coherent way (and that is the only one that matters here), dma_alloc_coherent will force every access to the memory to be uncached, while using dma_sync will only do a cache maintainance operation for each dma submission and completion. So yes, it matters. And Bart who has actually looked into the number has seen the sync case to be faster consistently for a SCSI ULP. Note that you can simplify and improve this case by using dma_alloc_noncoherent instead of a kernel allocator + dma_map_*.