On Fri, Jun 18, 2021 at 01:44:08PM +0900, Tomasz Figa wrote: > > Well, dma_alloc_coherent users want a non-cached mapping. And while > > some architectures provide that using a vmap with "uncached" bits in the > > PTE to provide that, this: > > > > a) is not possibly everywhere > > b) even where possible is not always the best idea as it creates mappings > > with differnet cachability bets > > I think this could be addressed by having a dma_vmap() helper that > does the right thing, whether it's vmap() or dma_common_pages_remap() > as appropriate. Or would be this still insufficient for some > architectures? It can't always do the right thing. E.g. for the case where uncached memory needs to be allocated from a special boot time fixed pool. > > And even without that dma_alloc_noncoherent causes less overhead than > > dma_alloc_noncontigious if you only need a single contiguous range. > > > > Given that behind the scenes dma_alloc_noncontiguous() would also just > call __dma_alloc_pages() for devices that need contiguous pages, would > the overhead be basically the creation of a single-entry sgtable? In the best case: yes. > > So while I'm happy we have something useful for more complex drivers like > > v4l I think the simple dma_alloc_coherent API, including some of the less > > crazy flags for dma_alloc_attrs is the right thing to use for more than > > 90% of the use cases. > > One thing to take into account here is that many drivers use the > existing "simple" way, just because there wasn't a viable alternative > to do something better. Agreed, though, that we shouldn't optimize for > the rare cases. While that might be true for a few drivers, it is absolutely not true for the wide majority. I think you media people are a little special, with only the GPU folks contending for "specialness" :) (although media handles it way better, gpu folks just create local hacks that can't work portably).