Excerpts from Edgecombe, Rick P's message of December 1, 2020 6:21 am: > On Sun, 2020-11-29 at 01:25 +1000, Nicholas Piggin wrote: >> Support huge page vmalloc mappings. Config option >> HAVE_ARCH_HUGE_VMALLOC >> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and >> supports PMD sized vmap mappings. >> >> vmalloc will attempt to allocate PMD-sized pages if allocating PMD >> size >> or larger, and fall back to small pages if that was unsuccessful. >> >> Allocations that do not use PAGE_KERNEL prot are not permitted to use >> huge pages, because not all callers expect this (e.g., module >> allocations vs strict module rwx). > > Several architectures (x86, arm64, others?) allocate modules initially > with PAGE_KERNEL and so I think this test will not exclude module > allocations in those cases. Ah, thanks. I guess archs must additionally ensure that their PAGE_KERNEL allocations are suitable for huge page mappings before enabling the option. If there is interest from those archs to support this, I have an early (un-posted) patch that adds an explicit VM_HUGE flag that could override the pessemistic arch default. It's not much trouble to add this to the large system hash allocations. It's very out of date now but I can at least give what I have to anyone doing an arch support that wants it. > > [snip] > >> @@ -2400,6 +2453,7 @@ static inline void set_area_direct_map(const >> struct vm_struct *area, >> { >> int i; >> >> + /* HUGE_VMALLOC passes small pages to set_direct_map */ >> for (i = 0; i < area->nr_pages; i++) >> if (page_address(area->pages[i])) >> set_direct_map(area->pages[i]); >> @@ -2433,11 +2487,12 @@ static void vm_remove_mappings(struct >> vm_struct *area, int deallocate_pages) >> * map. Find the start and end range of the direct mappings to >> make sure >> * the vm_unmap_aliases() flush includes the direct map. >> */ >> - for (i = 0; i < area->nr_pages; i++) { >> + for (i = 0; i < area->nr_pages; i += 1U << area->page_order) { >> unsigned long addr = (unsigned long)page_address(area- >> >pages[i]); >> if (addr) { >> + unsigned long page_size = PAGE_SIZE << area- >> >page_order; >> start = min(addr, start); >> - end = max(addr + PAGE_SIZE, end); >> + end = max(addr + page_size, end); >> flush_dmap = 1; >> } >> } > > The logic around this is a bit tangled. The reset of the direct map has > to succeed, but if the set_direct_map_() functions require a split they > could fail. For x86, set_memory_ro() calls on a vmalloc alias will > mirror the page size and permission on the direct map and so the direct > map will be broken to 4k pages if it's a RO vmalloc allocation. > > But after this, module vmalloc()'s could have large pages which would > result in large RO pages on the direct map. Then it could possibly fail > when trying to reset a 4k page out of a large RO direct map mapping. > > I think either module allocations need to be actually excluded from > having large pages (seems like you might have seen other issues as > well?), or another option could be to use the changes here: > https://lore.kernel.org/lkml/20201125092208.12544-4-rppt@xxxxxxxxxx/ > to reset the direct map for a large page range at a time for large > vmalloc pages. > Right, x86 would have to do something about that before enabling. A VM_HUGE flag might be quick and easy but maybe other options are not too difficult. Thanks, Nick