On 10/19/22 2:59 PM, Albert Huang wrote: > From: "huangjie.albert" <huangjie.albert@xxxxxxxxxxxxx> > > implement get/set_policy for hugetlb_vm_ops to support the shared policy > This ensures that the mempolicy of all processes sharing this huge page > file is consistent. > > In some scenarios where huge pages are shared: > if we need to limit the memory usage of vm within node0, so I set qemu's > mempilciy bind to node0, but if there is a process (such as virtiofsd) > shared memory with the vm, in this case. If the page fault is triggered > by virtiofsd, the allocated memory may go to node1 which depends on > virtiofsd. Although we can use the memory prealloc provided by qemu to > avoid this issue, but this method will significantly increase the > creation time of the vm(a few seconds, depending on memory size). > > after we hooked up hugetlb_vm_ops(set/get_policy): > both the shared memory segments created by shmget() with SHM_HUGETLB flag > and the mmap(MAP_SHARED|MAP_HUGETLB), also support shared policy. > > v1->v2: > 1、hugetlb share the memory policy when the vma with the VM_SHARED flag. > 2、update the documentation. > > Signed-off-by: huangjie.albert <huangjie.albert@xxxxxxxxxxxxx> > --- > .../admin-guide/mm/numa_memory_policy.rst | 20 +++++++++------ > mm/hugetlb.c | 25 +++++++++++++++++++ > 2 files changed, 37 insertions(+), 8 deletions(-) > > diff --git a/Documentation/admin-guide/mm/numa_memory_policy.rst b/Documentation/admin-guide/mm/numa_memory_policy.rst > index 5a6afecbb0d0..5672a6c2d2ef 100644 > --- a/Documentation/admin-guide/mm/numa_memory_policy.rst > +++ b/Documentation/admin-guide/mm/numa_memory_policy.rst > @@ -133,14 +133,18 @@ Shared Policy > the object share the policy, and all pages allocated for the > shared object, by any task, will obey the shared policy. > > - As of 2.6.22, only shared memory segments, created by shmget() or > - mmap(MAP_ANONYMOUS|MAP_SHARED), support shared policy. When shared > - policy support was added to Linux, the associated data structures were > - added to hugetlbfs shmem segments. At the time, hugetlbfs did not > - support allocation at fault time--a.k.a lazy allocation--so hugetlbfs > - shmem segments were never "hooked up" to the shared policy support. > - Although hugetlbfs segments now support lazy allocation, their support > - for shared policy has not been completed. > + As of 2.6.22, only shared memory segments, created by shmget() without > + SHM_HUGETLB flag or mmap(MAP_ANONYMOUS|MAP_SHARED) without MAP_HUGETLB > + flag, support shared policy. When shared policy support was added to Linux, > + the associated data structures were added to hugetlbfs shmem segments. > + At the time, hugetlbfs did not support allocation at fault time--a.k.a > + lazy allocation--so hugetlbfs shmem segments were never "hooked up" to > + the shared policy support. Although hugetlbfs segments now support lazy > + allocation, their support for shared policy has not been completed. > + > + after we hooked up hugetlb_vm_ops(set/get_policy): > + both the shared memory segments created by shmget() with SHM_HUGETLB flag > + and mmap(MAP_SHARED|MAP_HUGETLB), also support shared policy. > > As mentioned above in :ref:`VMA policies <vma_policy>` section, > allocations of page cache pages for regular files mmap()ed > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 87d875e5e0a9..fc7038931832 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -4632,6 +4632,27 @@ static vm_fault_t hugetlb_vm_op_fault(struct vm_fault *vmf) > return 0; > } > > +#ifdef CONFIG_NUMA > +int hugetlb_vm_op_set_policy(struct vm_area_struct *vma, struct mempolicy *mpol) > +{ > + struct inode *inode = file_inode(vma->vm_file); > + > + if (!(vma->vm_flags & VM_SHARED)) > + return 0; > + > + return mpol_set_shared_policy(&HUGETLBFS_I(inode)->policy, vma, mpol); > +} > + > +struct mempolicy *hugetlb_vm_op_get_policy(struct vm_area_struct *vma, unsigned long addr) > +{ > + struct inode *inode = file_inode(vma->vm_file); > + pgoff_t index; > + > + index = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; > + return mpol_shared_policy_lookup(&HUGETLBFS_I(inode)->policy, index); > +} > +#endif > + > /* > * When a new function is introduced to vm_operations_struct and added > * to hugetlb_vm_ops, please consider adding the function to shm_vm_ops. > @@ -4645,6 +4666,10 @@ const struct vm_operations_struct hugetlb_vm_ops = { > .close = hugetlb_vm_op_close, > .may_split = hugetlb_vm_op_split, > .pagesize = hugetlb_vm_op_pagesize, > +#ifdef CONFIG_NUMA > + .set_policy = hugetlb_vm_op_set_policy, > + .get_policy = hugetlb_vm_op_get_policy, > +#endif > }; > > static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, How is the current usage of /* Set numa allocation policy based on index */ hugetlb_set_vma_policy(&pseudo_vma, inode, index); enforcing the policy with the current code? Also if we have get_policy() Can we remove the usage of the same in hugetlbfs_fallocate() after this patch? With shared policy we should be able to fetch the policy via get_vma_policy()? A related question does shm_pseudo_vma_init() requires that mpolicy_lookup? -aneesh