Excerpts from Christoph Hellwig's message of January 25, 2021 1:07 am: > On Sun, Jan 24, 2021 at 06:22:29PM +1000, Nicholas Piggin wrote: >> diff --git a/arch/Kconfig b/arch/Kconfig >> index 24862d15f3a3..f87feb616184 100644 >> --- a/arch/Kconfig >> +++ b/arch/Kconfig >> @@ -724,6 +724,16 @@ config HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD >> config HAVE_ARCH_HUGE_VMAP >> bool >> >> +config HAVE_ARCH_HUGE_VMALLOC >> + depends on HAVE_ARCH_HUGE_VMAP >> + bool >> + help >> + Archs that select this would be capable of PMD-sized vmaps (i.e., >> + arch_vmap_pmd_supported() returns true), and they must make no >> + assumptions that vmalloc memory is mapped with PAGE_SIZE ptes. The >> + VM_NOHUGE flag can be used to prohibit arch-specific allocations from >> + using hugepages to help with this (e.g., modules may require it). > > help texts don't make sense for options that aren't user visible. Yeah it was supposed to just be a comment but if it was user visible then similar kind of thing would not make sense in help text, so I'll just turn it into a real comment as per Randy's suggestion. > More importantly, is there any good reason to keep the option and not > just go the extra step and enable huge page vmalloc for arm64 and x86 > as well? Yes they need to ensure they exclude vmallocs that can't be huge one way or another (VM_ flag or prot arg). After they're converted we can fold it into HUGE_VMAP. >> +static inline bool is_vm_area_hugepages(const void *addr) >> +{ >> + /* >> + * This may not 100% tell if the area is mapped with > PAGE_SIZE >> + * page table entries, if for some reason the architecture indicates >> + * larger sizes are available but decides not to use them, nothing >> + * prevents that. This only indicates the size of the physical page >> + * allocated in the vmalloc layer. >> + */ >> + return (find_vm_area(addr)->page_order > 0); > > No need for the braces here. > >> } >> >> +static int vmap_pages_range_noflush(unsigned long addr, unsigned long end, >> + pgprot_t prot, struct page **pages, unsigned int page_shift) >> +{ >> + unsigned int i, nr = (end - addr) >> PAGE_SHIFT; >> + >> + WARN_ON(page_shift < PAGE_SHIFT); >> + >> + if (page_shift == PAGE_SHIFT) >> + return vmap_small_pages_range_noflush(addr, end, prot, pages); > > This begs for a IS_ENABLED check to disable the hugepage code for > architectures that don't need it. Yeah good point. >> +int map_kernel_range_noflush(unsigned long addr, unsigned long size, >> + pgprot_t prot, struct page **pages) >> +{ >> + return vmap_pages_range_noflush(addr, addr + size, prot, pages, PAGE_SHIFT); >> +} > > Please just kill off map_kernel_range_noflush and map_kernel_range > off entirely in favor of the vmap versions. I can do a cleanup patch on top of it. >> + for (i = 0; i < area->nr_pages; i += 1U << area->page_order) { > > Maybe using a helper that takes the vm_area_struct and either returns > area->page_order or always 0 based on IS_ENABLED? I'll see how it looks. Thanks, Nick