[SNIP]The problem was then that dma_buf_unmap_attachment cannot be called before the dma_fence is signaled, and calling it after is already too late (because the fence would be signaled before the data is sync'd).Well what sync are you talking about? CPU sync? In DMA-buf that is handled differently. For importers it's mandatory that they can be coherent with the exporter. That usually means they can snoop the CPU cache if the exporter can snoop the CPU cache.I seem to have such a system where one device can snoop the CPU cache and the other cannot. Therefore if I want to support it properly, I do need cache flush/sync. I don't actually try to access the data using the CPU (and when I do, I call the sync start/end ioctls).
Usually that isn't a problem as long as you don't access the data with the CPU.
[SNIP]
(and I *think* there is a way to force coherency in the Ultrascale's interconnect - we're investigating it)What you can do is that instead of using udmabuf or dma-heaps is that the device which can't provide coherency act as exporters of the buffers. The exporter is allowed to call sync_for_cpu/sync_for_device on it's own buffers and also gets begin/end CPU access notfications. So you can then handle coherency between the exporter and the CPU.But again that would only work if the importers would call begin_cpu_access() / end_cpu_access(), which they don't, because they don't actually access the data using the CPU.
Wow, that is a completely new use case then.
Neither DMA-buf nor the DMA subsystem in Linux actually supports this as far as I can see.
Unless you mean that the exporter can call sync_for_cpu/sync_for_device before/after every single DMA transfer so that the data appears coherent to the importers, without them having to call begin_cpu_access() / end_cpu_access().
Yeah, I mean the importers don't have to call begin_cpu_access() / end_cpu_access() if they don't do CPU access :)
What you can still do as exporter is to call sync_for_device() and sync_for_cpu() before and after each operation on your non-coherent device. Paired with the fence signaling that should still work fine then.
But taking a step back, this use case is not something even the low level DMA subsystem supports. That sync_for_cpu() does the right thing is coincident and not proper engineering.
What you need is a sync_device_to_device() which does the appropriate actions depending on which devices are involved.
In which case - this would still demultiply the complexity; my USB- functionfs interface here (and IIO interface in the separate patchset) are not device-specific, so I'd rather keep them importers.If you really don't have coherency between devices then that would be a really new use case and we would need much more agreement on how to do this.[snip] Agreed. Desiging a good generic solution would be better. With that said... Let's keep it out of this USB-functionfs interface for now. The interface does work perfectly fine on platforms that don't have coherency problems. The coherency issue in itself really is a tangential issue.
Yeah, completely agree.
So I will send a v6 where I don't try to force the cache coherency - and instead assume that the attached devices are coherent between themselves. But it would be even better to have a way to detect non-coherency and return an error on attach.
Take a look into the DMA subsystem. I'm pretty sure we already have something like this in there.
If nothing else helps you could take a look if the coherent memory access mask is non zero or something like that.
Regards,
Christian.
Cheers, -Paul