Mike Kravetz <mike.kravetz@xxxxxxxxxx> 于2022年10月15日周六 00:56写道: > > On 10/12/22 12:45, Hugh Dickins wrote: > > On Wed, 12 Oct 2022, Albert Huang wrote: > > > > > From: "huangjie.albert" <huangjie.albert@xxxxxxxxxxxxx> > > > > > > implement these two functions so that we can set the mempolicy to > > > the inode of the hugetlb file. This ensures that the mempolicy of > > > all processes sharing this huge page file is consistent. > > > > > > In some scenarios where huge pages are shared: > > > if we need to limit the memory usage of vm within node0, so I set qemu's > > > mempilciy bind to node0, but if there is a process (such as virtiofsd) > > > shared memory with the vm, in this case. If the page fault is triggered > > > by virtiofsd, the allocated memory may go to node1 which depends on > > > virtiofsd. > > > > > > Signed-off-by: huangjie.albert <huangjie.albert@xxxxxxxxxxxxx> > > Thanks for the patch Albert, and thank you Hugh for the comments! > > > Aha! Congratulations for noticing, after all this time. hugetlbfs > > contains various little pieces of code that pretend to be supporting > > shared NUMA mempolicy, but in fact there was nothing connecting it up. > > I actually had to look this up to verify it was not supported. However, the > documentation is fairly clear. > From admin-guide/mm/numa_memory_policy.rst. > > "As of 2.6.22, only shared memory segments, created by shmget() or > mmap(MAP_ANONYMOUS|MAP_SHARED), support shared policy. When shared > policy support was added to Linux, the associated data structures were > added to hugetlbfs shmem segments. At the time, hugetlbfs did not > support allocation at fault time--a.k.a lazy allocation--so hugetlbfs > shmem segments were never "hooked up" to the shared policy support. > Although hugetlbfs segments now support lazy allocation, their support > for shared policy has not been completed." > > It is somewhat embarrassing that this has been known for so long and > nothing has changed. > > > It will be for Mike to decide, but personally I oppose adding > > shared NUMA mempolicy support to hugetlbfs, after eighteen years. > > > > The thing is, it will change the behaviour of NUMA on hugetlbfs: > > in ways that would have been sensible way back then, yes; but surely > > those who have invested in NUMA and hugetlbfs have developed other > > ways of administering it successfully, without shared NUMA mempolicy. > > > > At the least, I would expect some tests to break (I could easily be > > wrong), and there's a chance that some app or tool would break too. > > > > I have carried the reverse of Albert's patch for a long time, stripping > > out the pretence of shared NUMA mempolicy support from hugetlbfs: I > > wanted that, so that I could work on modifying the tmpfs implementation, > > without having to worry about other users. > > > > Mike, if you would prefer to see my patch stripping out the pretence, > > let us know: it has never been a priority to send in, but I can update > > it to 6.1-rc1 if you'd like to see it. (Once upon a time, it removed > > all need for struct hugetlbfs_inode_info, but nowadays that's still > > required for the memfd seals.) > > > > Whether Albert's patch is complete and correct, I haven't begun to think > > about: I am not saying it isn't, but shared NUMA mempolicy adds another > > dimension of complexity, and need for support, that I think hugetlbfs > > would be better off continuing to survive without. > > To be honest, I have not looked into the complexities of shared NUMA > mempolicy and exactly what is required for it's support. With my limited > knowledge, it appears that this patch adds some type of support for shared > policy, but it may not provide all support mentioned in the documentation. > > At the very least, this patch should also update documentation to state > what type of support is provided. > > Albert, can you look into what would be required for full support? I can take > a look as well but have some other higher priority tasks to work first. > Lucky to do this job, let me think about it. > TBH, I like Hugh's idea of removing the 'pretence of shared policy support'. > We are currently wasting memory carrying around extra unused fields in > hugetlbfs_inode_info. :( > -- > Mike Kravetz