On Fri, Jun 18, 2021 at 1:25 PM Christoph Hellwig <hch@xxxxxx> wrote: > > On Fri, Jun 18, 2021 at 12:21:33PM +0900, Tomasz Figa wrote: > > > On Thu, Jun 17, 2021 at 06:40:58PM +0900, Tomasz Figa wrote: > > > > Sorry, I meant dma_alloc_attrs() and yes, it's indeed a misnomer. Our > > > > use case basically has no need for the additional coherent mapping, so > > > > creation of it can be skipped to save some vmalloc space. (Yes, it > > > > probably only matters for 32-bit architectures.) > > > > > > Yes, that is the normal use case, and it is solved by using > > > dma_alloc_noncoherent or dma_alloc_noncontigous without the vmap > > > step. > > > > True, silly me. Probably not enough coffee at the time I was looking at it. > > > > With that, wouldn't it be possible to completely get rid of > > dma_alloc_{coherent,attrs}() and use dma_alloc_noncontiguous() + > > optional kernel and/or userspace mapping helper everywhere? (Possibly > > renaming it to something as simple as dma_alloc(). > > Well, dma_alloc_coherent users want a non-cached mapping. And while > some architectures provide that using a vmap with "uncached" bits in the > PTE to provide that, this: > > a) is not possibly everywhere > b) even where possible is not always the best idea as it creates mappings > with differnet cachability bets I think this could be addressed by having a dma_vmap() helper that does the right thing, whether it's vmap() or dma_common_pages_remap() as appropriate. Or would be this still insufficient for some architectures? > > And even without that dma_alloc_noncoherent causes less overhead than > dma_alloc_noncontigious if you only need a single contiguous range. > Given that behind the scenes dma_alloc_noncontiguous() would also just call __dma_alloc_pages() for devices that need contiguous pages, would the overhead be basically the creation of a single-entry sgtable? > So while I'm happy we have something useful for more complex drivers like > v4l I think the simple dma_alloc_coherent API, including some of the less > crazy flags for dma_alloc_attrs is the right thing to use for more than > 90% of the use cases. One thing to take into account here is that many drivers use the existing "simple" way, just because there wasn't a viable alternative to do something better. Agreed, though, that we shouldn't optimize for the rare cases. Best regards, Tomasz