On Fri, Nov 30, 2018 at 10:35:27AM +0100, Daniel Vetter wrote: > > Whether the cache maintenance operation needs to actually do anything > > or not is a function of `dev`. We can have some devices that are > > coherent with CPU caches, and some that are not, on the same system. > > So the thing is that the gpu driver knows this too. It fairly often can > even decide at runtime (through surface state bits or gpu pte bits) > whether to use coherent or non-coherent transactions. dma-api assuming > that a given device is always coherent or non-coherent is one of the > fundamental mismatches we have. > > If you otoh need dev because there's some strange bridge caches you need > to flush (I've never seen that, but who knows), that would be a diffeernt > thing. All the bridge flushing I've seen is attached to the iommu though, > so would be really a surprise if the cache management needs that too. Strange bridge caches aren't the problem. Outside of magic components like SOCs integrated GPUs the issue is that a platform can wire up a PCIe/AXI/etc bus either so that it is cache coherent, or not cache coherent (does not snooping). Drivers need to use the full DMA API include dma_sync_* to cater for the non-coherent case, which will turn into no-ops if DMA is coherent. Now PCIe now has unsnooped transactions, which can be non-coherent even if the bus would otherwise be coherent. We have so far very much ignored those in Linux (at least Linux in general, I know you guys have some driver-local hacks), but if that use case, or similar ones for GPUs on SOCs become common we need to find a solution. One of the major issues here is that architectures that always are DMA coherent might not even have the cache flushing instructions, or even if they do we have not wired them up in the DMA code as we didn't need them. So what we'd need to support this properly is to do something like: - add new arch hook that allows an architecture to say it supports optional non-coherent at the arch level, and for a given device - wire up arch_sync_dma_for_{device,cpu} for those architectures that define it if they don't currently have it (e.g. x86) - add a new DMA_ATTR_* flag to opt into cache flushing even if the device declares it is otherwise coherent And I'd really like to see that work driven by an actual user.