Le 28/12/2021 à 11:26, Kefeng Wang a écrit : > > On 2021/12/27 23:56, Dave Hansen wrote: >> On 12/27/21 6:59 AM, Kefeng Wang wrote: >>> This patch select HAVE_ARCH_HUGE_VMALLOC to let X86_64 and X86_PAE >>> support huge vmalloc mappings. >> In general, this seems interesting and the diff is simple. But, I don't >> see _any_ x86-specific data. I think the bare minimum here would be a >> few kernel compiles and some 'perf stat' data for some TLB events. > > When the feature supported on ppc, > > commit 8abddd968a303db75e4debe77a3df484164f1f33 > Author: Nicholas Piggin <npiggin@xxxxxxxxx> > Date: Mon May 3 19:17:55 2021 +1000 > > powerpc/64s/radix: Enable huge vmalloc mappings > > This reduces TLB misses by nearly 30x on a `git diff` workload on a > 2-node POWER9 (59,800 -> 2,100) and reduces CPU cycles by 0.54%, due > to vfs hashes being allocated with 2MB pages. > > But the data could be different on different machine/arch. > >>> diff --git a/arch/x86/kernel/module.c b/arch/x86/kernel/module.c >>> index 95fa745e310a..6bf5cb7d876a 100644 >>> --- a/arch/x86/kernel/module.c >>> +++ b/arch/x86/kernel/module.c >>> @@ -75,8 +75,8 @@ void *module_alloc(unsigned long size) >>> p = __vmalloc_node_range(size, MODULE_ALIGN, >>> MODULES_VADDR + get_module_load_offset(), >>> - MODULES_END, gfp_mask, >>> - PAGE_KERNEL, VM_DEFER_KMEMLEAK, NUMA_NO_NODE, >>> + MODULES_END, gfp_mask, PAGE_KERNEL, >>> + VM_DEFER_KMEMLEAK | VM_NO_HUGE_VMAP, NUMA_NO_NODE, >>> __builtin_return_address(0)); >>> if (p && (kasan_module_alloc(p, size, gfp_mask) < 0)) { >>> vfree(p); >> To figure out what's going on in this hunk, I had to look at the cover >> letter (which I wasn't cc'd on). That's not great and it means that >> somebody who stumbles upon this in the code is going to have a really >> hard time figuring out what is going on. Cover letters don't make it >> into git history. > Sorry for that, will add more into arch's patch changelog. >> This desperately needs a comment and some changelog material in *this* >> patch. >> >> But, even the description from the cover letter is sparse: >> >>> There are some disadvantages about this feature[2], one of the main >>> concerns is the possible memory fragmentation/waste in some scenarios, >>> also archs must ensure that any arch specific vmalloc allocations that >>> require PAGE_SIZE mappings(eg, module alloc with STRICT_MODULE_RWX) >>> use the VM_NO_HUGE_VMAP flag to inhibit larger mappings. >> That just says that x86 *needs* PAGE_SIZE allocations. But, what >> happens if VM_NO_HUGE_VMAP is not passed (like it was in v1)? Will the >> subsequent permission changes just fragment the 2M mapping? >> . > > Yes, without VM_NO_HUGE_VMAP, it could fragment the 2M mapping. > > When module alloc with STRICT_MODULE_RWX on x86, it calls > __change_page_attr() > > from set_memory_ro/rw/nx which will split large page, so there is no > need to make > > module alloc with HUGE_VMALLOC. > Maybe there is no need to perform the module alloc with HUGE_VMALLOC, but it least it would still work if you do so. Powerpc did add VM_NO_HUGE_VMAP temporarily and for some reason which is explained in a comment. If x86 already has the necessary logic to handle it, why add VM_NO_HUGE_VMAP ? Christophe