On 10/17/22 13:33, David Hildenbrand wrote: > On 17.10.22 11:48, 黄杰 wrote: > > David Hildenbrand <david@xxxxxxxxxx> 于2022年10月17日周一 16:44写道: > > > > > > On 12.10.22 10:15, Albert Huang wrote: > > > > From: "huangjie.albert" <huangjie.albert@xxxxxxxxxxxxx> > > > > > > > > implement these two functions so that we can set the mempolicy to > > > > the inode of the hugetlb file. This ensures that the mempolicy of > > > > all processes sharing this huge page file is consistent. > > > > > > > > In some scenarios where huge pages are shared: > > > > if we need to limit the memory usage of vm within node0, so I set qemu's > > > > mempilciy bind to node0, but if there is a process (such as virtiofsd) > > > > shared memory with the vm, in this case. If the page fault is triggered > > > > by virtiofsd, the allocated memory may go to node1 which depends on > > > > virtiofsd. > > > > > > > > > > Any VM that uses hugetlb should be preallocating memory. For example, > > > this is the expected default under QEMU when using huge pages. > > > > > > Once preallocation does the right thing regarding NUMA policy, there is > > > no need to worry about it in other sub-processes. > > > > > > > Hi, David > > thanks for your reminder > > > > Yes, you are absolutely right, However, the pre-allocation mechanism > > does solve this problem. > > However, some scenarios do not like to use the pre-allocation mechanism, such as > > scenarios that are sensitive to virtual machine startup time, or > > scenarios that require > > high memory utilization. The on-demand allocation mechanism may be better, > > so the key point is to find a way support for shared policy。 > > Using hugetlb -- with a fixed pool size -- without preallocation is like > playing with fire. Hugetlb reservation makes one believe that on-demand > allocation is going to work, but there are various scenarios where that can > go seriously wrong, and you can run out of huge pages. I absolutely agree with this cautionary note. hugetlb reservations guarantee that a sufficient number of huge pages exist. However, there is no guarantee that those pages are on any specific node associated with a numa policy. Therefore, an 'on demand' allocation could fail resulting in SIGBUS being set to the faulting process. -- Mike Kravetz