On Fri, Jul 24, 2020 at 7:34 PM Michal Hocko <mhocko@xxxxxxxxxx> wrote: > > On Fri 24-07-20 18:03:06, Muchun Song wrote: > > In the reservation routine, we only check whether the cpuset meets > > the memory allocation requirements. But we ignore the mempolicy of > > MPOL_BIND case. If someone mmap hugetlb succeeds, but the subsequent > > memory allocation may fail due to mempolicy restrictions and receives > > the SIGBUS signal. This can be reproduced by the follow steps. > > > > 1) Compile the test case. > > cd tools/testing/selftests/vm/ > > gcc map_hugetlb.c -o map_hugetlb > > > > 2) Pre-allocate huge pages. Suppose there are 2 numa nodes in the > > system. Each node will pre-allocate one huge page. > > echo 2 > /proc/sys/vm/nr_hugepages > > > > 3) Run test case(mmap 4MB). We receive the SIGBUS signal. > > numactl --membind=0 ./map_hugetlb 4 > > > > With this patch applied, the mmap will fail in the step 3) and throw > > "mmap: Cannot allocate memory". > > > > Reported-by: Jianchao Guo <guojianchao@xxxxxxxxxxxxx> > > Signed-off-by: Muchun Song <songmuchun@xxxxxxxxxxxxx> > > --- > > > > changelog in v2: > > 1) Reuse policy_nodemask(). > > > > include/linux/mempolicy.h | 1 + > > mm/hugetlb.c | 19 ++++++++++++++++--- > > mm/mempolicy.c | 2 +- > > 3 files changed, 18 insertions(+), 4 deletions(-) > > > > diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h > > index ea9c15b60a96..6b9640f1c990 100644 > > --- a/include/linux/mempolicy.h > > +++ b/include/linux/mempolicy.h > > @@ -152,6 +152,7 @@ extern int huge_node(struct vm_area_struct *vma, > > extern bool init_nodemask_of_mempolicy(nodemask_t *mask); > > extern bool mempolicy_nodemask_intersects(struct task_struct *tsk, > > const nodemask_t *mask); > > +extern nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy); > > extern unsigned int mempolicy_slab_node(void); > > > > extern enum zone_type policy_zone; > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 589c330df4db..a753fe8591b4 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -3463,12 +3463,25 @@ static int __init default_hugepagesz_setup(char *s) > > } > > __setup("default_hugepagesz=", default_hugepagesz_setup); > > > > -static unsigned int cpuset_mems_nr(unsigned int *array) > > +static unsigned int allowed_mems_nr(struct hstate *h) > > { > > int node; > > unsigned int nr = 0; > > + struct mempolicy *mpol = get_task_policy(current); > > + nodemask_t *mpol_allowed, *mems_allowed, nodemask; > > + unsigned int *array = h->free_huge_pages_node; > > + gfp_t gfp_mask = htlb_alloc_mask(h); > > + > > + mpol_allowed = policy_nodemask(gfp_mask, mpol); > > + if (mpol_allowed) { > > + nodes_and(nodemask, cpuset_current_mems_allowed, > > + *mpol_allowed); > > + mems_allowed = &nodemask; > > + } else { > > + mems_allowed = &cpuset_current_mems_allowed; > > + } > > I believe you can simplify this and use a similar pattern as the page > allocator. Something like > > for_each_node_mask(node, mpol_allowed) { > if (node_isset(node, &cpuset_current_mems_allowed)) > nr += array[node]; > } > > There shouldn't be any need to allocate a potentially large nodemask on > the stack. An unsigned long can satisfy 64 nodes. So I think that nodemask is using little stack memory. Right? > -- > Michal Hocko > SUSE Labs -- Yours, Muchun