From: Christoph Hellwig > Sent: 14 June 2019 14:47 > > Many architectures (e.g. arm, m68 and sh) have always used exact > allocation in their dma coherent allocator, which avoids a lot of > memory waste especially for larger allocations. Lift this behavior > into the generic allocator so that dma-direct and the generic IOMMU > code benefit from this behavior as well. > > Signed-off-by: Christoph Hellwig <hch@xxxxxx> > --- > include/linux/dma-contiguous.h | 8 +++++--- > kernel/dma/contiguous.c | 17 +++++++++++------ > 2 files changed, 16 insertions(+), 9 deletions(-) > > diff --git a/include/linux/dma-contiguous.h b/include/linux/dma-contiguous.h > index c05d4e661489..2e542e314acf 100644 > --- a/include/linux/dma-contiguous.h > +++ b/include/linux/dma-contiguous.h > @@ -161,15 +161,17 @@ static inline struct page *dma_alloc_contiguous(struct device *dev, size_t size, > gfp_t gfp) > { > int node = dev ? dev_to_node(dev) : NUMA_NO_NODE; > - size_t align = get_order(PAGE_ALIGN(size)); > + void *cpu_addr = alloc_pages_exact_node(node, size, gfp); > > - return alloc_pages_node(node, gfp, align); > + if (!cpu_addr) > + return NULL; > + return virt_to_page(p); > } Does this still guarantee that requests for 16k will not cross a 16k boundary? It looks like you are losing the alignment parameter. There may be drivers and hardware that also require 12k allocates to not cross 16k boundaries (etc). David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)