Re: [PATCH 01/12] hugetlbfs: drop shared NUMA mempolicy pretence

Andi Kleen <ak@xxxxxxxxxxxxxxx> · Mon, 25 Sep 2023 15:46:31 -0700

On Mon, Sep 25, 2023 at 01:21:10AM -0700, Hugh Dickins wrote:
> hugetlbfs_fallocate() goes through the motions of pasting a shared NUMA
> mempolicy onto its pseudo-vma, but how could there ever be a shared NUMA
> mempolicy for this file?  hugetlb_vm_ops has never offered a set_policy
> method, and hugetlbfs_parse_param() has never supported any mpol options
> for a mount-wide default policy.
> 
> It's just an illusion: clean it away so as not to confuse others, giving
> us more freedom to adjust shmem's set_policy/get_policy implementation.
> But hugetlbfs_inode_info is still required, just to accommodate seals.
> 
> Yes, shared NUMA mempolicy support could be added to hugetlbfs, with a
> set_policy method and/or mpol mount option (Andi's first posting did
> include an admitted-unsatisfactory hugetlb_set_policy()); but it seems
> that nobody has bothered to add that in the nineteen years since v2.6.7
> made it possible, and there is at least one company that has invested
> enough into hugetlbfs, that I guess they have learnt well enough how to
> manage its NUMA, without needing shared mempolicy.

TBH i'm not sure people in general rely on shared mempolicy. The
original use case for it was to modify the numa policy of non anonymous
shared memory files without modifying the program (e.g. Oracle
database's shared memory segments)

But I don't think that particular usage model ever got any real
traction: at leas I haven't seen any real usage of it outside my tests.

I suspect people either are fine with just process policy or modify the
program, in which case it's not a big burden to modify every user,
so process policy or vma based mbind policy works fine.

Maybe it would be an interesting experiment to disable it everywhere
with some flag and see if anybody complains.

On the other hand it might be Hyrum'ed by now.

-Andi