Excerpts from Linus Torvalds's message of April 21, 2022 4:02 pm: > On Wed, Apr 20, 2022 at 10:48 PM Linus Torvalds > <torvalds@xxxxxxxxxxxxxxxxxxxx> wrote: >> >> The lagepage thing needs to be opt-in, and needs a lot more care. > > Side note: part of the opt-in really should be about the performance impact. > > It clearly can be quite noticeable, as outlined by that powerpc case > in commit 8abddd968a30 ("powerpc/64s/radix: Enable huge vmalloc > mappings"), but it presumably is some _particular_ case that actually > matters. > > But it's equalyl clearly not the module code/data case, since > __module_alloc() explicitly disables largepages on powerpc. > > At a guess, it's one or more of the large hash-table allocations. The changelog is explicit it is the vfs hashes. > And it would actually be interesting to hear *which*one*. From the > 'git diff' workload, I'd expect it to be the dentry lookup hash table > - I can't think of anything else that would be vmalloc'ed that would > be remotely interesting - but who knows. I didn't measure dentry/inode separately but it should mostly (~entirely?) be the dentry hash, yes. > So I think the whole "opt in" isn't _purely_ about the "oh, random > cases are broken for odd reasons, so let's not enable it by default". The whole concept is totally broken upstream now though. Core code absolutely can not mark any allocation as able to use huge pages because x86 is in some crazy half-working state. Can we use hugepage dentry cache with x86 with hibernation? With BPF? Who knows. > I think it would actually be good to literally mark the cases that > matter (and have the performance numbers for those cases). As per previous comment, not for correctness but possibly to help guide some heuristic. I don't see it being too big a deal though, a multi-MB vmalloc that can use hugepages probably wants to, quite small downside (fragmentation being about the only one, but there aren't a vast number of such allocations in the kernel to have been noticed as yet). Thanks, Nick