Excerpts from Claudio Imbrenda's message of June 11, 2021 1:42 am: > The recent patches to add support for hugepage vmalloc mappings added a > flag for __vmalloc_node_range to allow to request small pages. > This flag is not accessible when calling vmalloc, the only option is to > call directly __vmalloc_node_range, which is not exported. > > This means that a module can't vmalloc memory with small pages. > > Case in point: KVM on s390x needs to vmalloc a large area, and it needs > to be mapped with small pages, because of a hardware limitation. > > This patch adds the function vmalloc_no_huge, which works like vmalloc, > but it is guaranteed to always back the mapping using small pages. This > function is exported, therefore it is usable by modules. > > Signed-off-by: Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx> > Cc: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> > Cc: Nicholas Piggin <npiggin@xxxxxxxxx> > Cc: Uladzislau Rezki (Sony) <urezki@xxxxxxxxx> > Cc: Catalin Marinas <catalin.marinas@xxxxxxx> > Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> > Cc: Ingo Molnar <mingo@xxxxxxxxxx> > Cc: David Rientjes <rientjes@xxxxxxxxxx> > Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx> > --- > include/linux/vmalloc.h | 1 + > mm/vmalloc.c | 16 ++++++++++++++++ > 2 files changed, 17 insertions(+) > > diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h > index 4d668abb6391..bfaaf0b6fa76 100644 > --- a/include/linux/vmalloc.h > +++ b/include/linux/vmalloc.h > @@ -135,6 +135,7 @@ extern void *__vmalloc_node_range(unsigned long size, unsigned long align, > const void *caller); > void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask, > int node, const void *caller); > +void *vmalloc_no_huge(unsigned long size); > > extern void vfree(const void *addr); > extern void vfree_atomic(const void *addr); > diff --git a/mm/vmalloc.c b/mm/vmalloc.c > index a13ac524f6ff..296a2fcc3fbe 100644 > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -2998,6 +2998,22 @@ void *vmalloc(unsigned long size) > } > EXPORT_SYMBOL(vmalloc); > > +/** > + * vmalloc_no_huge - allocate virtually contiguous memory using small pages > + * @size: allocation size > + * > + * Allocate enough non-huge pages to cover @size from the page level > + * allocator and map them into contiguous kernel virtual space. > + * > + * Return: pointer to the allocated memory or %NULL on error > + */ > +void *vmalloc_no_huge(unsigned long size) > +{ > + return __vmalloc_node_range(size, 1, VMALLOC_START, VMALLOC_END, GFP_KERNEL, PAGE_KERNEL, > + VM_NO_HUGE_VMAP, NUMA_NO_NODE, __builtin_return_address(0)); > +} > +EXPORT_SYMBOL(vmalloc_no_huge); At some point if the combination of flags becomes too much we will need a different strategy. A vmalloc API with (size, align, gfp_t, vm_flags, node) args would help 3/6 of the existing non-arch callers too. And one more if you had a prot parameter or _exec variant. But for now I'm okay with this. Acked-by: Nicholas Piggin <npiggin@xxxxxxxxx> Thanks, Nick