On Thu, Mar 24, 2022 at 10:07 AM Toke Høiland-Jørgensen <toke@xxxxxxx> wrote: > > Right, but is that sync_for_device call really needed? Well, imagine that you have a non-cache-coherent DMA (not bounce buffers - just bad hardware)... So the driver first does that dma_sync_single_for_cpu() for the CPU see the current state (for the non-cache-coherent case it would just invalidate caches). The driver then examines the command buffer state, sees that it's still in progress, and does that return -EINPROGRESS. It's actually very natural in that situation to flush the caches from the CPU side again. And so dma_sync_single_for_device() is a fairly reasonable thing to do in that situation. But it doesn't seem *required*, no. The CPU caches only have a copy of the data in them, no writeback needed (and writeback wouldn't work since DMA from the device may be in progress). So I don't think the dma_sync_single_for_device() is *wrong* per se, because the CPU didn't actually do any modifications. But yes, I think it's unnecessary - because any later CPU accesses would need that dma_sync_single_for_cpu() anyway, which should invalidate any stale caches. And it clearly doesn't work in a bounce-buffer situation, but honestly I don't think a "CPU modified buffers concurrently with DMA" can *ever* work in that situation, so I think it's wrong for a bounce buffer model to ever do anything in the dma_sync_single_for_device() situation. Does removing that dma_sync_single_for_device() actually fix the problem for the ath driver? There's a fair number of those dma_sync_single_for_device() things all over. Could we find mis-uses and warn about them some way? It seems to be a very natural thing to do in this context, but bounce buffering does make them very fragile. Linus