Excerpts from Linus Torvalds's message of April 22, 2022 1:44 am: > On Thu, Apr 21, 2022 at 1:57 AM Nicholas Piggin <npiggin@xxxxxxxxx> wrote: >> >> Those were (AFAIKS) all in arch code though. > > No Nick, they really weren't. > > The bpf issue with VM_FLUSH_RESET_PERMS means that all your arguments > are invalid, because this affected non-architecture code. VM_FLUSH_RESET_PERMS was because bpf uses the arch module allocation code which was not capable of dealing with huge pages in the arch specific direct map manipulation stuff was unable to deal with it. An x86 bug. > So the bpf case had two independent issues: one was just bpf doing a > really bad job at making sure the executable mapping was sanely > initialized. > > But the other was an actual bug in that hugepage case for vmalloc. > > And that bug was an issue on power too. I missed it, which bug was that? > > So your "this is purely an x86 issue" argument is simply wrong. > Because I'm very much looking at that power code that says "oh, > __module_alloc() needs more work". > > Notice? No I don't notice. More work to support huge allocations for executable mappings, sure. But the arch's implementation explicitly does not support that yet. That doesn't make huge vmalloc broken! Ridiculous. It works fine. > > Can these be fixed? Yes. But they can't be fixed by saying "oh, let's > disable it on x86". You did just effectively disable it on x86 though. And why can't it be reverted on x86 until it's fixed on x86?? > Although it's probably true that at that point, some of the issues > would no longer be nearly as noticeable. There really aren't all these "issues" you're imagining. They aren't noticable now, on power or s390, because they have non-buggy HAVE_ARCH_HUGE_VMALLOC implementations. If you're really going to insist on this will you apply this to fix (some of) the performance regressions it introduced? Thanks, Nick diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6e5b4488a0c5..b555f17e84d5 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8919,7 +8919,10 @@ void *__init alloc_large_system_hash(const char *tablename, table = memblock_alloc_raw(size, SMP_CACHE_BYTES); } else if (get_order(size) >= MAX_ORDER || hashdist) { - table = __vmalloc(size, gfp_flags); + if (IS_ENABLED(CONFIG_PPC) || IS_ENABLED(CONFIG_S390)) + table = vmalloc_huge(size, gfp_flags); + else + table = __vmalloc(size, gfp_flags); virt = true; if (table) huge = is_vm_area_hugepages(table);