On Tue, Jan 15, 2019 at 06:03:39PM +0000, Thomas Hellstrom wrote: > In the graphics case, it's probably because it doesn't fit the graphics > use-cases: > > 1) Memory typically needs to be mappable by another device. (the "dma- > buf" interface) And there is nothing preventing dma-buf sharing of these buffers. Unlike the get_sgtable mess it can actually work reliably on architectures that have virtually tagged caches and/or don't guarantee cache coherency with mixed attribute mappings. > 2) DMA buffers are exported to user-space and is sub-allocated by it. > Mostly there are no GPU user-space kernel interfaces to sync / flush > subregions and these syncs may happen on a smaller-than-cache-line > granularity. I know of no architectures that can do cache maintainance on a less than cache line basis. Either the instructions require you to specifcy cache lines, or they do sometimes more, sometimes less intelligent rounding up. Note that as long dma non-coherent buffers are devices owned it is up to the device and the user space driver to take care of coherency, the kernel very much is out of the picture.