On Fri 24-07-20 18:03:06, Muchun Song wrote: > In the reservation routine, we only check whether the cpuset meets > the memory allocation requirements. But we ignore the mempolicy of > MPOL_BIND case. If someone mmap hugetlb succeeds, but the subsequent > memory allocation may fail due to mempolicy restrictions and receives > the SIGBUS signal. This can be reproduced by the follow steps. > > 1) Compile the test case. > cd tools/testing/selftests/vm/ > gcc map_hugetlb.c -o map_hugetlb > > 2) Pre-allocate huge pages. Suppose there are 2 numa nodes in the > system. Each node will pre-allocate one huge page. > echo 2 > /proc/sys/vm/nr_hugepages > > 3) Run test case(mmap 4MB). We receive the SIGBUS signal. > numactl --membind=0 ./map_hugetlb 4 > > With this patch applied, the mmap will fail in the step 3) and throw > "mmap: Cannot allocate memory". > > Reported-by: Jianchao Guo <guojianchao@xxxxxxxxxxxxx> > Signed-off-by: Muchun Song <songmuchun@xxxxxxxxxxxxx> > --- > > changelog in v2: > 1) Reuse policy_nodemask(). > > include/linux/mempolicy.h | 1 + > mm/hugetlb.c | 19 ++++++++++++++++--- > mm/mempolicy.c | 2 +- > 3 files changed, 18 insertions(+), 4 deletions(-) > > diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h > index ea9c15b60a96..6b9640f1c990 100644 > --- a/include/linux/mempolicy.h > +++ b/include/linux/mempolicy.h > @@ -152,6 +152,7 @@ extern int huge_node(struct vm_area_struct *vma, > extern bool init_nodemask_of_mempolicy(nodemask_t *mask); > extern bool mempolicy_nodemask_intersects(struct task_struct *tsk, > const nodemask_t *mask); > +extern nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy); > extern unsigned int mempolicy_slab_node(void); > > extern enum zone_type policy_zone; > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 589c330df4db..a753fe8591b4 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -3463,12 +3463,25 @@ static int __init default_hugepagesz_setup(char *s) > } > __setup("default_hugepagesz=", default_hugepagesz_setup); > > -static unsigned int cpuset_mems_nr(unsigned int *array) > +static unsigned int allowed_mems_nr(struct hstate *h) > { > int node; > unsigned int nr = 0; > + struct mempolicy *mpol = get_task_policy(current); > + nodemask_t *mpol_allowed, *mems_allowed, nodemask; > + unsigned int *array = h->free_huge_pages_node; > + gfp_t gfp_mask = htlb_alloc_mask(h); > + > + mpol_allowed = policy_nodemask(gfp_mask, mpol); > + if (mpol_allowed) { > + nodes_and(nodemask, cpuset_current_mems_allowed, > + *mpol_allowed); > + mems_allowed = &nodemask; > + } else { > + mems_allowed = &cpuset_current_mems_allowed; > + } I believe you can simplify this and use a similar pattern as the page allocator. Something like for_each_node_mask(node, mpol_allowed) { if (node_isset(node, &cpuset_current_mems_allowed)) nr += array[node]; } There shouldn't be any need to allocate a potentially large nodemask on the stack. -- Michal Hocko SUSE Labs