On 5/25/19 8:28 PM, Andrew Morton wrote: > (Cc Vlastimil) Oh dear, 2 years and I forgot all the details about how this works. > On Sat, 25 May 2019 15:07:23 +0800 zhong jiang <zhongjiang@xxxxxxxxxx> wrote: > >> We bind an different node to different vma, Unluckily, >> it will bind different vma to same node by checking the /proc/pid/numa_maps. >> Commit 213980c0f23b ("mm, mempolicy: simplify rebinding mempolicies when updating cpusets") >> has introduced the issue. when we change memory policy by seting cpuset.mems, >> A process will rebind the specified policy more than one times. >> if the cpuset_mems_allowed is not equal to user specified nodes. hence the issue will trigger. >> Maybe result in the out of memory which allocating memory from same node. I have a hard time understanding what the problem is. Could you please write it as a (pseudo) reproducer? I.e. an example of the process/admin mempolicy/cpuset actions that have some wrong observed results vs the correct expected result. >> --- a/mm/mempolicy.c >> +++ b/mm/mempolicy.c >> @@ -345,7 +345,7 @@ static void mpol_rebind_nodemask(struct mempolicy *pol, const nodemask_t *nodes) >> else { >> nodes_remap(tmp, pol->v.nodes,pol->w.cpuset_mems_allowed, >> *nodes); >> - pol->w.cpuset_mems_allowed = tmp; >> + pol->w.cpuset_mems_allowed = *nodes; Looks like a mechanical error on my side when removing the code for step1+step2 rebinding. Before my commit there was pol->w.cpuset_mems_allowed = step ? tmp : *nodes; Since 'step' was removed and thus 0, I should have used *nodes indeed. Thanks for catching that. >> } >> >> if (nodes_empty(tmp)) > > hm, I'm not surprised the code broke. What the heck is going on in > there? It used to have a perfunctory comment, but Vlastimil deleted > it. Yeah the comment was specific for the case that was being removed. > Could someone please propose a comment for the above code block > explaining why we're doing what we do? I'll have to relearn this first...