Hi, This is a followup to the previous attempt to overhaul how vmalloc permissions are done: https://lore.kernel.org/lkml/20201120202426.18009-1-rick.p.edgecombe@xxxxxxxxx/ In working on a next version it dawned on me that we can significantly reduce direct map breakage on x86 with a much less invasive change, so I thought maybe I would start there in the meantime. In a test of booting fedora and running the BPF unit tests, this reduced 4k direct map pages by 98%. It simply uses pages for x86 module_alloc() mappings from a cache created out of 2MB pages. This results in all the later breakage clustering in 2MB blocks on the direct map. The trade-off is colder pages are used for these allocations. All module_alloc() users (modules, ebpf jit, ftrace, kprobes) get this behavior. Potentially this behavior should be enabled for eBPF byte code allocations as well in the case of !CONFIG_BPF_JIT_ALWAYS_ON. The new APIs and invasive changes in the callers can happen after vmalloc huge pages bring more benefits. Although, I can post shootdown reduction changes with previous comments integrated if anyone disagrees. Based on v5.11. Thanks, Rick Rick Edgecombe (3): list: Support getting most recent element in list_lru vmalloc: Support grouped page allocations x86/module: Use VM_GROUP_PAGES flag arch/x86/Kconfig | 1 + arch/x86/kernel/module.c | 2 +- include/linux/list_lru.h | 13 +++ include/linux/vmalloc.h | 1 + mm/Kconfig | 9 ++ mm/list_lru.c | 28 +++++ mm/vmalloc.c | 215 +++++++++++++++++++++++++++++++++++++-- 7 files changed, 257 insertions(+), 12 deletions(-) -- 2.29.2